UN, ~ 
Vol. 43, Parts 3 and 4 December T1086 N 
DEC 21 1956 


SCIENCE 
LIBRARY 


myc 1CUC Ee 


>’ BIOMETRIKA 


Nea! L) | 
/BEIS 
U#3 
pis. 3/4. FOUNDED BY 
W. F. R.A WELDON, FRANCIS GALTON ann KARL PEARSON 


MANAGING EDITOR 


EK. S. PEARSON 


ASSOCIATE EDITOR 
M. G. KENDALL 


in consultation with 
J. B.S. HALDANE 


H. 0. HARTLEY 
D. G. KENDALL 


HARALD CRAMER 
F. N. DAVID 
R. C. GEARY 


ISSUED BY 
THE BIOMETRIKA OFFICE, UNIVERSITY COLLEGE, LONDON 


PRINTED AT THE UNIVERSITY PRESS, CAMBRIDGE 


[Issued 4 December 1956) 











A volume of Biometrika containing about 500 pages, with plates and tables, 
will be published annually in two half-yearly issues, 
Papers for publication should be sent either to 


PROFESSOR E. S. PEARSON 
Department of Statistics, University College, London, W.C. 1 


or if more convenient to 


PROFESSOR M. G. KENDALL 
London School of Economics, Houghton Street, London, W.C. 2 


It is a condition of publication in Biometrika that the paper shall not already 
have been issued elsewhere, and will not be reprinted without leave of the 
Editors. 

Contributors receive 25 copies of their papers free. Joint authors 15 copies 
each. Order forms for separates are sent to authors with proofs of their 
papers. 

The Subscription price, payable in advance, is: 

Inland: 45s. net per volume Abroad: 54s. net per volume 
including packing and postage 
BACK ISSUES 
Volumes 20-43 
These may be obtained from the BIOMETRIKA OFFICE, at the following 
prices: 
Vols. 20-38: £5 per volume Vols. 39-43: £3. 3s. per volume 
packing and postage 2s. per volume 
Bound volumes: £1 extra per volume Binding cases: 10s. each 


Cheques should be made payable to Biometrika, crossed “a/c BIOMETRIKA 
TRUST” and sent to 
THE SECRETARY 


BIOMETRIKA OFFICE 
UNIVERSITY COLLEGE 
GOWER STREET, LONDON, W.C.1 


to whom all orders for series, single copies and offprints should be addressed. 
All foreign cheques must be drawn in sterling and on a Bank having a 
London Agency. 








Volumes 1-19 
Permission for reprinting has been granted to Messrs Wm. Dawson & 
Sons Ltd. Volumes 1-13 are now ready for distribution and Volumes 
14-I9 are in course of preparation. Would librarians and others wishing 
to have copies please place their orders now with: 


Wm. DAWSON & SONS LTD., 


4 DUKE STREET, MANCHESTER SQUARE, 
LONDON, W.1 






































VoLUME 43, Parts 3 AND 4 DECEMBER 1956 





STUDIES IN THE HISTORY OF PROBABILITY AND STATISTICS 
Ill. A NOTE ON THE HISTORY OF THE GRAPHICAL PRFSENTATION OF DATA 


By ERICA ROYSTON 


Division of R.zarch Techniques, London School of Economics and Political Science 


1. The cartoon of the business man despondently watching his production curve dis- 
appearing off the chart on the wall and down through the floor into the office below appears 
to most people to be a joke that has worn slightly thin. Nevertheless, the very fact that 
graphical representation can become the subject of a cartoon shows how completely it has 
come into everyday usage. The graph is now generally accepted as one of the clearest and 
most effective ways of presenting data, whether for consumption by the layman or the 
specialist. Many volumes have been published on the art of graphical presentation, on the 
appropriate method to be used for any specific purpose and on the advantages and dis- 
advantages of various approaches to the subject. It seems strange therefore that so little 
curiosity has been displayed regarding the historical origins of the graphical representation 
of data.* In fact, these historical origins are not at all clear. 


2. As in most other historical studies of this nature it is virtually impossible to state 
categorically that a given method was introduced by any one person or at any one moment 
of time. The most that one may hope to do is to show that such a method was being used at 
a certain time and to investigate the author's claims, if any, to originality. Apart from this, 
some of the underlying concepts can be traced back to their possible origins. 


3. For the purpose of this note the graphical representation of data will be taken to mean 
the geometric and graphical presentation of factual data as distinct from the graphical 
plotting of mathematical functions. Although the development of the technique of using 
Cartesian co-ordinates with one axis representing time seems to date back only a few cen- 
turies, the spatial representation of time originated much earlier. There exist instances of 
the movement of stars, or, more precisely, the inclinations of the planetary orbits being 
plotted as a function of time, as early as the tenth century (Funkhouser, 1936), and the 
emergence of written music represents one of the earliest instances of the use of time-series. + 


4. The basic idea of using co-ordinates to determine the location of a point in space dates 
back to the Greeks at least, although it was not until the time of Descartes that mathe- 
maticians systematically developed the idea. For the purpose of tracing the origins of 
graphical representation the emergence of the concept of plotting mathematical functions 


* While holding the part-time appointment of Professor of Geometry at Gresham College (1890-4) 
Karl Pearson gave a series of twelve lectures under the heading ‘The Geometry of Statistics’. The 
synopsis which survives refers to Playfair as the father of the subject but the lectures themselves 
appear to have been lost. See E. 8. Pearson (1938, p. 142.) 

+ Neumes (vevpara), the basis of present-day musical notes, were at first only approximately 
measured in terms of time. The duration of notes was not fixed precisely until the twelfth and thirteenth 
centuries, when the so-called Franconian reform (after Franco of Cologne’s Ars Cantus Mensurabilis) 
took place. Together with the gradual introduction of the bar and a fixed beat in the bar, music became 
what may be termed a true time-series, where notes were graphically presented, using the tone as 
ordinate against the time as abscissa. 

16 Biom. 43 











242 Studies in the history of probability and statistics. III 


of the type y=f(x) is important. The most usual form of time-series is, in fact, little more 
than this, except that usually some form of frequency is plotted as an observed, not neces- 
sarily strictly mathematical, function of time. Cartesian co-ordinates are therefore one of 
the foundations on which the modern graph of data is based. It is interesting to note in 
passing that William Playfair (1801) justifies plotting money against time as follows: 

This method has struck several persons as being fallacious because geometrical measurement has 
not any relation to money or to time yet here it is made to represent both. The most familiar and 
simple answer to this objection is that if the money received by a single man in trade were all guineas 
and every evening he made a single pile of all the guineas received during the day, its height would 


be proportioned to the receipts of that day, so that by this plain operation time, proportion and amount 
would be physically combined. 


As will be shown, however, this picturesque description is merely an explanation to the 
layman, for Playfair himself almost certainly approached the subject as an extension of 
the use of rectangular co-ordinates, which by his time were familiar in mathematics. 


5. One of the early users of graphical representation in statistics was A. F. W. Crome. He 
was born in Germany in 1753, the third of twenty children. His father was a clergyman 
and, following in his footsteps, Crome studied theology. During his studies he acted as 
tutor to the children of General von Holzendorf and later to those of Karl Alexander von 
Bismarck. He passed his examinations in 1771 and in 1783 became lecturer in geography 
and history at Dessau. In 1783 he became a tutor to the 16-year-old Prince of Dessau, and 
finally in 1786 he was made ordinary Professor of Statistics and Public Finance at Giessen 
University, where he remained till he retired in 1831. He died two years later. 


6. With an academic background such as this it is hardly surprising to learn that Crome 
first evolved his system of geometric representation as an aid to teaching. Crome was, in 
modern terminology, an economic geographer rather than a statistician. His works are 
descriptive rather than analytical. Although the Allgemeine Deuische Biografie calls him a 
‘pioneer in statistics’, it is probably using the word in its original meaning, i.e. as data 
referring to states. Thus the adjective ‘statistical’, which appears in nearly all the titles 
of his works, e.g. Geographisch-statistische Darstellung der Staatskréfte, merely seems to 
indicate that some numerical data are given. Thus, for example, in Uber die Grésse und 
Bevélkerung der stimtlichen Europiéischen Staaten (1785), Crome gives a detailed description 
of the geographical ‘vital data’ of most European states, chiefly population figures and areas. 
To make the importance of such data clearer he devised his Gréssen Karte which was of 
much the same type as that illustrated in Fig. 1. (The latter is taken from a similar treatise 


dealing with the German states.) He justifies the use of this geometrical representation as 
follows (1785): 


If therefore one does not wish to limit one’s knowledge of geography to knowing the names of the 
cities, provinces, rivers and monarchs, but also wishes to include an overall view of the condition and 
might of the various empires and states in order to get a good grasp of their might, size and culture, 
and to compare these among themselves, one must get the largest possible overall view of the area 
and population of the European states—this is the purpose of the attached chart. 

On the one hand the eye can take in some idea of the relative size of the European states by just 
looking at a general map of this part of the world. But on the other hand the outline of the states is 
such that they cannot easily be compared. 

...The proportions of the different sizes can however be more easily seen and grasped if they are 
brought before the eye in the form of a drawing, because the imagination is thus stimulated, than if 


these merely appeared in the form of numbers, especially when these consist of many digits as is often 
the case with areas of states.... 





> 2@ wt © 








Erica Royston 243 


7. The method Crome employed in these charts was simple. He expressed the area of 
each of the states with which he was dealing as a square proportional to that area. He 
then drew these squares inside each other in such a manner as to keep the vertical sides of 
the squares, which are of course proportional to the square root of the area represented, in 
scale. When the area of a state was relatively much larger than the others he broke the 
scale in much the same manner as is done on modern graphs. 


8. The idea of representing data in this way seems to have been thought completely new, 
for Crome went into great detail in describing the uses and misuses of his charts. For 
example, he warns his readers that there is no justification for assuming that, because the 
squares were drawn one inside the other, the outside square, or rather the country repre- 
sented by it, was larger than all those within it added together. It appears that Crome was 
led to the idea by the comparison of geographical areas on maps; in fact, he used 
spatial representation of magnitude, not the linear magnitudes moving through time 
of the Cartesian approach. 

There is, however, one example of graphical representation being used by one of Crome’s 
contemporaries and compatriots, namely, G. R. von Gothe’s Héhen Tableau (Altitude 
Table) (1813). This is an imaginary panoramic scene, complete with trees and animals, 
where most of the highest known mountains of Europe and America appear in proportion, 
their altitudes being given by reference to a scale on the vertical axis. 


9. In 1817 Crome wrote: 


I have used this method with great success in my 35 year career as Docent. To this end I published 
in 1782 my first Produkten Karte of Europe, which was followed in 1785 by the Gréssen Karte. This 
was issued in an improved form in 1792 as the Verhdltniss Karte of Europe. 


It seems that the Produkten Karte referred to differed little from the later ones,* and it is 
a reissue of the Verhdltniss Karte that is illustrated here as Fig. 1. It appears as part of a 
much larger chart which also gives lists of data referring to military strength, etc. 


10. On the base of this chart, published in 1820, is a diagram that Crome seems to have 
used here for the first time. An extract of it is shown as Fig. 2. It consists basically of a series 
of circles each representing the density of population in the state to which it refers, the 
density being inversely proportionate to the area of the circle. The various half-tangents 
and extended radii denote total population and ‘national income, the lines being drawn in 
different colours to distinguish them from one another. Thus total population is denoted 
as follows: 

The tangent on the right-hand side of several circles indicates the population in millions 
as measured on the scale, the radii leaning to the right indicate 100 thousands, and the 
thin black lines also leaning to the right thousands. (N.B. The length of these slanting lines 
is misleading, as the quantity is indicated by the number of sections of the vertical scale 
crossed.) These quantities are then added together to give the total population. In fact, 
each of these lines represents a group of digits from the original figure. 

Thus, for example, Luxemburg (third from left) has a population density of 1981 per 
German square mile (from the text below the circle). Its population is indicated by the 


* This chart is missing in the copies of Crome’s works available both in the British Museum and the 
British Library of Political Science. 


16-2 











244 Studies in.the history of probability and statistics. III 


extended radius sloping to the right which shows 2-5 units, the units in this case being 
100,000 (see above). Total population is thus 250,000. The national income is shown as 
follows: 

The tangent on the left indicates 1 unit, i.e. 1 million Gulden, the extended radius sloping 
to the left indicates 5 units, i.e. 5x 100,000 Gulden. Total national income is thus 
1,000,000 + 500,000 = 1,500,000 Gulden. National income per head, as indicated by the 
radius extended downwards is 6 Gulden. 


11. Although a little involved and misleading and on the whole giving only a very poor 
overall picture, this method is perhaps more precise than if the population and national 
income figures had all been plotted to scale. Basically this could be taken as a rather 


crude attempt at a bar chart, the circles being merely a generalization of the squares Crome 
used earlier. 


12. Crome nowhere explicitly claims that he invented the methods he used, although 
implicitly he does so several times, as, for example, in the last passage quoted above. 
Nevertheless, this last method of circles, although it could merely have been a generaliza- 
tion of the squares, seem similar enough to that used by Playfair (1801) a few years earlier 


(Fig. 3) to warrant closer examination of the possibility that Crome did get some of his 
ideas from the latter. 


13. William Playfair was born in 1759 near Dundee, and was also the son of a clergyman. 
His father died when William was 13, his brother John, who later became famous as a result 
of his work in the field of mathematics and geology, taking charge of the family. William 
Playfair was apprenticed to Andrew Meikle, inventor of the threshing machine. In 1780 
he became a draughtsman for Boulton and Watt, and while with them he patented two 
inventions, the ‘Eldorado sash’ and a machine for making fretwork on silver tea-trays. 
Backed by these inventions Playfair opened a shop in London, which soon failed. In 1789 
he succeeded Joel Barlow as agent of the Scioto (Ohio) Land Company, a rather unsavoury 
concern which sold shares in land which may or may not have existed. While im Paris 
Playfair probably helped to capture the Bastille; he is known also to have saved Duval from 
the mob in 1791. In the next year allegations concerning his ‘mismanagement’ of the Ohio 


Company, coupled with some rather plain speaking against the Revolutionaries, forced 
Playfair to leave Paris for Frankfurt. 


14. Some of Playfair’s doings appear a little odd. For example, in 1793, while in Frank- 
furt, a French emigré gave him details of a system of signalling by means of the semaphore 
system. He quickly made models of the apparatus—a large clock-like dial, several of which 
were to be erected across the country to form a chain whereby messages could be relayed— 
and sent them to the Duke of York. Thereafter he claimed to have introduced the system to 
England. Inactual fact, R. L. Edgeworth (1797) had already used it in 1767 to relay the results 
of a race at Newmarket to his home some miles away. On his return to London, Playfair 
opened a ‘Security Bank’ to facilitate small loans by subdividing large securities. This very 
soon collapsed, and from 1795 Playfair lived only by writing. Among other things he claimed 
to have warned the government of Napoleon’s intention to escape from Elba. After Waterloo 
he returned to Paris to edit Galigni’s Messenger. While there he got involved in a case of 
libel, and to escape the sentence of three months’ imprisonment he fled to London. The 
remaining years before his death in 1823 he spent in London writing pamphlets. 





= le le ae ee ee ee 





ng 
as 


ng 
us 
he 


or 
al 
er 





Erica Royston 245 


15. There is nothing of the studious or academic background of Crome here, and it is 
therefore rather surprising that what Playfair wrote was good sound economics and, 
moreover, that his works were illustrated with some extremely good graphs, histograms 
and pie diagrams. He wrote mainly works on general descriptive economics, one of his 
favourite subjects being the international balance of trade. Most of the graphs he used were 
plotted time-series, chiefly export and import figures expressed in millions of pounds; but 
he also used circles to present spatial magnitudes after the manner of Crome. His earliest 
work containing such graphs seems to be his Commercial and Political Atlas, published in 
1786. This consists of graphs of exports and imports represented by coloured lines, the gap 
between them being termed the balance in favour of England and usually also coloured. 
He produced such graphs for trade to and from most of Britain’s foreign markets. In the 
same book there are also some very good bar diagrams showing the amount of exports to 
and imports from each country plotted against time. 


16. Playfair’s professed aim in publishing his charts was to make statistics, presumably 
again in the sense of data relating to state-craft, a little more palatable. Statistics, one may 
gather from the following extract from Playfair’s Statistical Breviary (1801), were not 
thought of very highly. 

...for no study is less alluring or more dry and tedious than statistics, unless the mind and imagina- 
tion are set to work or that the person studying is particularly interested in the subject; which is 
seldom the case with young men in any rank in life. 

17. Most of Playfair’s later diagrams were merely improvements of his earlier work. 
Fig. 4 shows one such graph where two series have been superimposed upon one another, 
one as a graph and the other as a histogram (the two series here being weekly wages and the 
price of a quarter of wheat (1821)). He illustrated his British Family Antiquity with several 
beautifully executed bar diagrams, which differed from the true frequency graphs used 
elsewhere by the fact that only the presence or absence of a given factor is plotted against 
time. This type of diagram, Playfair conceded, had long been used in chronology. 


18. Fig. 3 shows an extract from an earlier example of Playfair’s work. Here the areas 
of European states are expressed as circles and in some cases subdivided as pie diagrams. 
The line on the left-hand side of the circles indicates population in millions and that on the 
right-hand side national revenue in millions of pounds. The slope of the line joining these two 
is intended to indicate the approximate ratio of one to the other. Playfair nowhere states this 
and is presumably aware of the fact that the differing diameters of the circles make any 
exact comparison impossible. This particular figure has been mentioned earlier as closely 
resembling Crome’s work (Fig. 2) and therefore warranting the examination of the possi- 
bility of there being some connexion between the two writers. Certainly this was possible, 
for Crome and Playfair were contemporaries and, moreover, both wrote in the same field. 
Playfair’s diagram was, however, published before Crome’s, and it seems unlikely that, 
having had sufficient flair to produce perfect time-series graphs, he should have copied 
Crome. Conversely, however, it seems equally unlikely, even were it possible, that Crome 
should have taken a basically clear and simple diagram of Playfair’s and made such a com- 
plicated diagram as Fig. 2 out of it—unless, of course, he was working under the impression 
that he was improving on Playfair’s work. On the face of it, with no direct evidence either 
way, it seems that if there was any connexion between the two it was Crome who was 
copying from Playfair and not vice versa. But it is at least equally possible that the two 
worked entirely independently. 











246 Studies in the history of probability and statistics. III 


19. There is nothing crude or clumsy about Playfair’s work and it seems surprising that 
one man should have developed it to such a high pitch—for, unlike Crome, Playfair makes 
definite claims to originality, calling himself the ‘inventor of linear arithmetic’ as he termed 
graphical representation. In 1796, for instance, he writes: 


I confess I was long anxious to find out whether I was actually the first who applied the principles 
of geometry to matters of finance as it has long been applied to chronology with great success. I am 
now satisfied, upon due enquiry, that I was the first, for during the 11 years I have never been able to 
learn that anything of a familiar nature had ever before been thought of... 


and later (1805): 


The impression is not only simple, but it is as lasting in retaining as it is easy in receiving. Such 
are the advantages claimed for the invention 20 years ago, when it first appeared. The claim has been 
allowed and not objected to so far as the inventor knows, either in this or in any other country. 


And as an explanation of how he came to think of such graphical representation (1805); 


I think it well to embrace this opportunity, the best I have had, and perhaps the last I ever shall 
have, of making some return (as far as acknowledgement is a return) for an obligation, of a nature to 


be repaid by acknowledging publicly, that to the best and most affectionate of brothers I owe the 
invention of these charts. 

At a very early period of my life, my brother, who in a most exemplary manner maintained and 
educated the family his father left, made me keep a register of a thermometer, expressing the variation 
by lines on a divided scale. He taught me to know that whatever can be expressed in numbers may be 
represented by lines. The chart of the thermometer was on the same principle with those given here, 
the application only is different. The brother to whom I owe this now fills the Natural Philosophical 
Chair at the University of Edinburgh. 


20. Were it not for the somewhat doubtful background revealed by his biographers, 
including among other matters the bogus claim concerning the semaphore, this sounds 
honest enough. Certainly there did not seem to be any counterclaims made by his con- 
temporaries, for Playfair’s work was acclaimed both in this country and in France, where 
even the Academy of Sciences ‘testified its approbation of this application of geometry to 
accounts’ (Playfair, 1798). 


21. Apart from the influence exerted on Playfair by his brother, there is, however, 
another clue to be found in his biography. In 1780 Playfair worked as a draughtsman to 
Boulton and Watt. Now Watt is well known as the first engineer who illustrated in diagram- 
matic form the work done in a steam-engine cylinder by graphing pressure against volume, 
and one of his draughtsmen would undoubtedly have come into contact with graphical 
methods of presenting many kinds of motion. It seems very possible that Playfair’s real 
originality lay, not in the kind of diagram he used, but in its application to descriptive 
statistics and economics. 


22. On the whole, therefore, it seems quite possible that William Playfair, if not the 
‘inventor’ of graphical representation as we know it to-day, was the first to introduce it 
into statistics; and that Crome independently had much the same ideas so far as concerns 
spatial presentation, but did not carry them out quite as well. The only reasons for suspecting 
Playfair’s authenticity are his occasional dubious activities in other fields, and in the cireum- 
stances it is perhaps advisable to accord him the benefit of the doubt. 


23. Little that Playfair did would seem to have been beyond the capabilities or scope of 
men like Graunt, Petty or Halley a century earlier. It seems somewhat surprising that the 








at 
eS 


ch 
en 


of 
1e 





Biometrika, Vol. 43, Parts 3 and 4 


Royston: Studies in the History of Probability and Statistics. [11 



































Plate 1 

























































































sae Ww = = 
a 4 j q 1 ‘ 4. q ie a ted J 
- ~} ' a 
re | —ai 
1 ry 4 2 2 a 2 | 2 {#. 2} 
q = S 7 
4 . e D 
: REP) PS Hildiaoguawen oom PERS a wma wlis ea 
- | *iT) 
i FoAnk I 2 j 
“a4 be 4 Rkondenbeos B+ ’ eerie aj. 
livre conf erargnu mien oom 1 | 
f a eo 7 
of ° 
Sr a eee Ree + Jie & 
i 7 oe i 
6 oj , He 6 ie ~ a 0 le 
| 
4 fe 7 : 4 te ; 7 td ar 7 
bis: Mise se IS ae ae , | 
* ’ 
aw : 3 Crock. Sactaex Wriwan 6F co mein se j wv 8 a 8, if" 
oT : | 
g ‘. 2 9 o. ” o , 
putin’ 
Sa i oe cms. oe onal 
4 lenzoorn aan 2 Oo om ve yr a a) pel 
oe CU Lexespene 9 om GH Honest OLvewnene 779 0 ms » " q Sa By 
a 
re ” a’ 
fe. ee * at h. 
P Henxoerntann Lonermin ond LAtRNBORG 274 OC me ui } 
> an b> ot ie 
= aaa ny) = 
GCOMOSHERROG THEM LIERSEN 496 2 motion , i A ji 
= — td heal 
Cun Hrs 903 © meitew = re af 
can 
oe 
1 z St £5 Oe ee - 
5 51 sl 
ae 
we 
=, 
: a. 
Crosnpnzouruun TWasvex 2s 2 j he 
Ye my 
= t on 
Rae 
* 
ee j a] at 
t ie <aeeaes Kosieneien NS yensen oy 0 « x y i 
= coe unonnicr WEATe NBG OU sel “s t 
oT a 7 Tou 
hi ? 
. ol 
OP eee einiadhe nc « 1 
Paine a - 
4 










































































Figs. 1 and 2. Extracts from Crome’s Geographisch-statistische Darstellung, 1820. 


(Facing p. 246) 
Yt 








Biometrika, Vol. 43, Parts 3. and 4 


Royston: Studies in the History of Probability and Statistics. III 








[ee ewe ove 


Plate 2 











\ 
/ 
~ 








































































































i Fa 
a 
furopean . 7 S ‘7 
oe i genet  \ry AN TS 
q 1 pee cee Saf Mine | mpere te lair were canker "\ aeece 
| Russi ] \ Asiati } , at ae aay eee ~ 
/ i t} i 
/ lee i i} ! i 
é / TURNS aR MPI Re es 
\ 
\Astarie Domintoms / _—— 
. 
te a a al 
{ Re ’ * s ‘” Rae s an ‘ a 3 
Fig. 3. Extract from Playfair’s Statistical Breviary, 1801. 
- Ne! 
16" Century 17“ Century is” Century 19° Century 
— 
beetuteth Page hartes | etemeeeidacmente — Toa; sheer relented ,—— nav @ toe “ 
a an 
‘ 
a? = 
7© CHART, 5 ; 
Shewrny A Ohne View 
Mee rice 7. The Quartier Nhent ; | 
4 & Waders of Labour ty the Week.) oe 
| he Yous $565 WU, * ; 
’ NWILLam Prayer, * 
sang 
’ - 3 
‘ i 
‘ s bd 
é ’ ; ; 
7 Le = a 
Mord! Boge wed + F 
P 5 atange 
se , o -— 7 a ” on n 7 




















- — , rs ” - 


For a Lartentar Pagpleanutecn See Latter te tht harks © Commons 


Fig. 4. 


Extract from Playfair’s A Letter on our Agricultural Distress, 1821. 





ot kd feet ett et et et eet Or. 





yf ~ \"earter { of Weel om Nelly 








Erica Royston 247 


graphical presentation of statistics should have had to wait until the end of the eighteenth 
century. The explanation may well be that until the middle of that century there was very 
little to present. To quote Playfair himself (1801): 


Statistical knowledge, though in some degree searched after in the most early ages of the world, has 
not till within these last 50 years become a regular object of study. 


REFERENCES 


Crome, A. F. W. (1785). Uber die Grésse und Bevélkerung der stimtlichen Europdischen Staaten. Leipzig. 

CromE, A. F. W. (1820). Geographisch-statistische Darstellung der Staatskrafte. Leipzig. 

EpGEwortH, R. L. (1797). A Letter to the Rt. Hon. Earl of Charlemont on the Tellograph. Dublin. 

FunKHovseEr, H. G. (1936). A note on a 10th century graph. Osiris, 1. Bruges. 

Pearson, E. S. (~ 938). Karl Pearson. Cambridge University Press. 

PuayrFair, W. (1786). The Commercial and Political Atlas. London. 

Prayrarr, W. (1796). or the Use of the Enemies of England. London. 

PiayFatr, W. (1798). Linear Arithmetic. London. 

PuayFair, W. (1801). Statistical Breviary. London. 

Puayrarr, W. (1805). An Enquiry into the Permanent Causes of the Decline and Fall of Powerful 
Nations. London. 

Puayrarr, W. (1808). British Family Antiquity. London. 

Puayrarr, W. (1821). A Letter on our Agricultural Distress. London.. 

von GorueE, G. R. (1813). Héhen Tableau. Allgemeine Geographische Ephemeriden, 41. Weimar. 











[ 248 ] 


STUDIES IN THE HISTORY OF PROBABILITY AND STATISTICS 
IV. A NOTE ON AN EARLY STATISTICAL STUDY OF LITERARY STYLE 


By C. B. WILLIAMS, Sc.D., F.R.S. 


In Biometrika for January 1939, G. Udny Yule discussed the frequency distribution of 
sentence length in samples of the writings of different authors. After showing that each 
author had a fairly characteristic distribution, he turned to the value of the method in 
cases of uncertain or disputed authorship. Thus, in the case of De Imitationi Christi, he 
showed that the frequency distribution of sentences with different numbers of words more 
closely resembled that of works by Thomas & Kempis than that of works by de Gerson. 

In Biometrika for March 1940 I showed that the skew distribution found by Yule could 
be brought almost to a symmetrical form by using a geometric or logarithmic scale for the 
number of words per sentence, thereby simplifying the mathematical comparisons. In 
this note I mentioned that some years previously (about 1935) I had made a number of 
frequency distributions from different authors using the number of letters per word as the 
variable, but that I had not found any striking differences. I considered Yule’s use of the 
number of words per sentence as a better technique, giving a greater range of possible 
variation and comparison. 

In a letter written to me in June 1939 Yule said: ‘I booked up some ten years ago a number 
of distributions of word-length by the number of syllables only. Monosyllables are always 
considerably in the majority (if I remember rightly I omitted ‘‘a” and “‘the”’), and different 
authors diverged a good deal, but, so far as I can recall, the range from Bunyan to a Times 
Leader was not so very striking.’ 

Neither Yule nor myself was aware that quite extensive investigations in this line had 
been made and published in summary more than fifty years previously, giving frequency 
distributions of word lengths (by the number of letters per word) for several authors, and 
suggesting that similar distributions of the numbers of syllables per word, or the number of 
words per sentence, might well help to throw light on cases of doubtful or disputed author- 
ship. 

Through the kindness of Mr Rushworth Fogg of Glasgow I was puton the track of a paper 
published in 1901 by Thomas Corwin Mendenhall,* in which he gives a reference to a still 
earlier paper published in 1887, both of which I have been able to examine. 

Mendenhall states in his first paper (1887) that five or six years previously he had seen a 
suggestion in a book by Augustus de Morgan, possibly his Budget of Paradoxes, that it 
might be possible to identify the author of a book, a poem or a play by the average length of 
the words used in the construction. Mendenhall, however, considered that the method 
which he had adopted in this publication, of using the frequency distribution of words of 
different lengths, was better, as while the average number of letters per word is easily 
obtainable from his data, the shape of the distribution provides considerably increased 
possibilities of comparison. 

Augustus de Morgan was Professor of Mathematics at University College, London. His 
Budget of Paradoxes was first printed as weekly notes in the Athenaeum and republished in 


* See biographical note on p. 255 below. 





itn FR cet ao 


o~ 08 <2 ~ 2: Vie ak 





oo Ff 


Oo Oo © 


So ore hm 


OS EEE eee 








C. B. WILLIAMS 249 


book form in 1872, after the author’s death. I have examined the second edition (1915), 
but, although it contains a few references to cases of disputed authorship, I cannot find any 
suggestion about the use of the average number of letters per word. It may be in one of his 
earlier works, or perhaps in one of his Athenaeum notes that was not reprinted in book 
form. 

It is interesting to note than Mendenhall, who was primarily a physicist, was attracted 
to the frequency distribution technique by its resemblance to spectroscopic analysis, which 
in 1887 was much to the fore in scientific circles. He writes: ‘It is proposed to analyse a 
composition by forming what may be called a “word spectrum” or “characteristic curve” 
which shall be a graphic representation of the arrangement of words according to their 
length and the relative frequency of their occurrence.’ The mathematics of the compari- 
son of frequency distributions was very little understood at the time when he was writing. 

Mendenhall first discusses samples taken from different books by the same author to see 
if they resemble each other sufficiently closely to make comparisons between one author 
and another likely to be profitable. Most of the evidence is given in the form of thirteen 
diagrams and, unfortunately, in only very few cases are actual numbers presented. 

Mendenhall’s first seven graphs deal with various combinations of ten samples, each of 
one thousand words, from Dickens’s Oliver Twist and Thackeray’s Vanity Fair. In his 
second figure the distribution of five separate samples of 1000 words from Oliver Twist are 
shown superimposed, and there is no doubt as to their general resemblance. In another 
(Fig. 1 in this paper) he shows one graph for the whole 10,000 words from Oliver Twist and 
another for Vanity Fair. There is very little difference in the average length of the words 
(Dickens, 4-324; Thackeray, 4-481), but Vanity Fair has rather more words of 3 and of 
7-10 letters, while Oliver Twist has more of 1, 2 and 4—6 letters. Mendenhall was somewhat 
disappointed by the lack of difference and commented ‘it is certainly suprising that...so 
close an agreement should be found. This is particularly striking in the words of 11, 12 and 
13 letters, the numerical composition of which is as follows: 


Number of letters 11 12 13 
Dickens 85 57 29 
Thackeray 85 58 29° 


Undaunted by this small difference he next tried two groups of words from John Stuart 
Mill’s Political Economy and his Essay on Liberty, in which he ‘expected to find more longer 
words than in the novelists’. ‘But I confess to considerable surprise in finding from the 
very beginning that, although on the whole the anticipation was realized, the word which 
occurred most frequently was not the three-letter word, as with both Dickens and Thackeray, 
but the word of two letters.’ The explanation he says ‘is to be found in the liberal use of 
prepositions in sentence-building’. The results, given in two separate diagrams, are here 
combined into one (Fig. 2). 

Mendenhall next studied two addresses given by a Mr Edward Atkinson on ‘Labour 
Questions’ to two different audiences, one consisting of working men and the other students 
of a Theological College. There was ‘a marked difference in style’, but the word-length 
distributions (from two samples of 5000 words) were very similar. He comments that 
‘Mr Atkinson’s composition was remarkable in the shortness of the words used’. The 
average length was 4-298 letters; which is, however, only 0-044 shorter than the samples 
from Dickens. 











250 Studies in. the history of probability and statistics. IV 


For comparison with all the above studies of works in the English language, Mendenhall 
gave a distribution of the first 5500 words in Caesar’s Commentaries, in Latin. He finds a 
mean word length of 6-065 letters and an entirely different form of curve with peaks at 
2, 5 and 7 letters (see Fig. 3). This is of course connected with the Latin construction of 
adding to the main root for inflexions instead of using additional small words. 









































32.3.4 5 6 2.85 9 16:41:49: 13:14 45 iD 4S 165 iF 8 9 10 14 12 43 14 15 
T T T T T T T T T T T T T q T T q q T q T T J hin Stuart Mill 
250-—+— + + + ++ 4+ +++ +1250] 250-4 +++ + +4 Jot +—1250 
ak ] @ © Oliver Twist 4s a © © Political ikaw 4 
5 on . ae _% x Vanity Fair = 9 500 a 5 Essay on Liberty ial 
=F j *+*F40,000 words from each fa. 2 fy "5000 words from each 4 
SE \\ : 48 Oe, Dm : 
~ poor tt en 5 ee 
st \ 4 &F 7 
$[100-++—+—+ AY }—++_+4++_+ + +100] 31 100 
- x I EF 4 
Eb 4 5+ \ as ~ 
st f 4ZE *™*\ : 
+50) Set —+-—_+—_+—_+—_+—150-] [5 — 50 
EX on ie | ing : 
re eee a, er ee ee ee 
4.2.34 & €) 3 © SF WW ee 123 45 67 8 9 10 11 12 13 14 13 
No. of letters per word No. of letters per word 
Fig. 1 Fig. 2 


Fig. 1. Samples of 10,000 words each from Dickens’s Oliver Twist and Thackeray’s Vanity Fair. 
Redrawn from Mendenhall (1887, fig. 7). 

Fig. 2. Samples of 5000 words each from two works of John Stuart Mill. Redrawn from Mendenhall 
(1887, figs. 8, 9). 















































23 4 § 6 7 8 '9 7-2 14 15 
Cp tec We eke eo re gee 123 4 5 6 7 8 9 10 11 12 13 1415 
33 T T T qT q T T qT T qT T qT T ' T 
($+ tt tt tt ttt ee et ee 
ps i 3 @ @ Latin e @ © Shakespeare, 400,000 words 
oe . x X< Italian 4 
5 Be. O OGerman 2 xX & Bacon, 200,000 words 
> [200+ ‘ 4 
sf Ly : = 
Up os 3 
=F Ban. 3, A 4 4 4 n \ 4 a 4 4 - 
x [L50F> ‘: x o re T T Ls aa T T . 
ar \ a 
5 Hooy_fé § 
a . S 
ZE : Zz 
2 ea 1 x 
1259 4 3.4 2 8 710 11 12 13 14 15 
No. of letters per word 





Fig. 3 
Fig. 3. Examples of distribution of word length in languages other than English. (1) Latin, (2) 
Italian, (3) German. Redrawn from Mendenhall (1887, fig. 13 and 1901, fig. 2). 


Fig. 4. Comparison of frequency distribution of word length in two very large samples from the plays 
of Shakespeare and works of Bacon. Redrawn from Mendenhall (1901, fig. 7). 


He considers that, for a really reliable estimate of the characteristic curve for an author, 
asample curve of 100,000 words might be necessary, and concludes his first paper as follows: 
‘Many interesting applications of the process will suggest themselves to almost every 
reader; the most notable, of course, being the attempt to solve questions of disputed 
authorship, such as exist in reference to the letters of Junius, the plays of Shakespeare, and 
other less widely known examples. It might also be used in comparative language studies, 











ir. 











C. B. WILLIAMS 251 


in tracing the growth of a language, and in studying the growth of the vocabulary from 
childhood to manhood.’ 

‘Tf striking differences are found between the curve of known and suspected compositions 
of any writer, the evidence against identity of authorship would be quite conclusive. If 
the two compositions should produce curves which are practically identical, the proof of 
a common origin would be less convincing for it is possible though not probable, that two 
writers might show identical curves.’ 

It was not until 14 years later than Mendenhall returned to the problem in a paper in 
Popular Science Monthly, published in December 1901. In this he repeats some of the dis- 
cussion and diagrams from his earlier paper dealing with Dickens, Thackeray, John Stuart 
Mill and Mr Atkinson; but in addition to his earlier diagrams showing analysis of a Latin 
work he gives examples of single authors in Italian, Spanish, French and German (see Fig. 3). 
The French and Spanish curves, possibly by the idiosyncrasies of the authors chosen, have 
their peaks in the 2-letter words; the Italian has two peaks at 2 and 5 letters; while the German 
has a peak at 3 letters, but with more longer words, reaching one word of 27 letters. 

After this introduction, he settles down to a discussion of the value of his technique in 
the study of the authorship of the plays of Shakespeare. For this the length of nearly two 
million words were counted from the works of Shakespeare and of some of his contemporaries. 
Most unfortunately the data are all condensed into half a dozen small diagrams, and not one 
table of the actual numbers is given. Does his evidence still exist, hidden away somewhere? 

Mendenhall says: ‘The result from the start, with the first group of 1000 words, was a 
decided surprise. Two things appeared from the beginning: Shakespeare’s vocabulary 
consisted of words whose average length was a trifle below four letters, less than any writer 
of English previously studied; and his word of greatest frequency was the four-letter word, 
a thing never before met with’ (Fig. 4). 

A comparison of the diagrams with those of Thackeray and Dickens shows that Shake- 
speare had a higher proportion of words with 1, 2, 4 and 5 letters, and a lower proportion of 
words with 3 letters and of 6 letters upwards; which accounts for the fact that while his peak 
is higher, his average number of letters per word is lower. In modern terminology, he would 
have a smaller standard deviation. 

Altogether about 400,000 words were counted including ‘in whole or in part, nearly all 
his most famous plays’, and it was found that this characteristic curve is most persistent— 
that based on first 50,000 words differing very little from the whole count. In a diagram 
giving two examples of 200,000 words each, it is practically impossible to separate the two 
lines, in spite of the fact that Mendenhall says the differences have been of necessity slightly 
exaggerated in order to make them show at all! 

A comparison was next made of Shakespeare’s prose and his poetry as exemplified by 
The Rape of Lucrece and Venus and Adonis. The prose gave more shorter words, particularly 
of 2 letters, and fewer words of 5, 6 and 7 letters; but both gave the characteristic peak at 
the 4-letter word. Mendenhall writes: ‘ At first this was thought to be a general characteristic 
of his time, but this was found not to be so.’ 

A study was then made of a number of works by Francis Bacon, including his Henry VII 
and his Advancement of Learning, with a total of nearly 200,000 words. The frequency dis- 
tribution was quite different from that of Shakespeare (see Fig. 4) with the peak at the 
3-letter word, with more 2-letter words, fewer with 4, 5 and 6 letters, and more longer words 
with 7-13 letters. Mendenhall here comments that ‘the reader is at liberty to draw any 








252 Studies in the history of probability and statistics. IV 


conclusion he pleases from this diagram. Should he conclude that, in view of the extra- 
ordinary difference in these lines, it is clear that Bacon could not have written the things 
ordinarily attributed to Shakespeare...the question still remains, who did?’ 

An examination of the works of Ben Johnson, in two groups of 75,000 words, showed once 
more a peak at the 3-letter word; but an extensive study of the plays of Beaumont and 
Fletcher showed that on the final average the number of 4-letter words was slightly greater 
than those of 3 letters, although the excess was by no means persistent in smaller samples. 
The final curve was not unlike that of Shakespeare, and Mendenhall suggested that the ‘lack 
of persistency of form among small groups’ might be due to the dual authorship. 


Log scale 
Q__02 04 . 0- 08 10 























































































































2 T 2 
23% Ey 
2 3 o 5 6.7 2 9 40 UL 42 43 14 45 $98 7 981 
i a el a a é zs 
ee GT ee 2/95 Kei |95 
p ee Marlowe 4 3190 Xe 
BE x x Shakespeare 4s 4 
$ [200+— —+—+—_+—200] » |80 394 80) 
«3 4 #170 o! Ye 70 
ds 4 § 60 iH 60 
= Hs +—+—+—150] £ 150% A 4 
gf jo SG 40 
3 1 7 2Bo aZvl 30 
§ 100- +++ ++ \' ++—++-+ ++ +109} § [20 OX i! 20 
gE / a J sito Za SO 10 
= [0% tian, dan a +50} 2] 5-4" 5 
rE . 42Le % 
C a ae ee a ee 7 3 é Z 3. 4 5 6 7891012 
ee 45678 9 10 11 1 13 14 15 < No. of letters per word 


No. of letters per word 
Fig. 5 Fig. 6 
Fig. 5. Comparison of frequency distribution of word length in two large samples from the plays of 
Shakespeare and of Christopher Marlowe. Redrawn from Mendenhall (1901, fig. 9). 


Fig. 6. The number of letters per word, on a logarithmic scale, from works of Thackeray and Shake- 
speare (as shown by Mendenhall) plotted against the accumulated total as a percentage of the 
whole sample on a probability scale. It indicates some resemblance to a log-normal distribution 
for words up to about 8 letters, but differing above this level. 


When, however, he turned his attention to the plays of Christopher Marlowe ‘something 
akin to a sensation was produced among those engaged in the work’. ‘In the characteristic 
curve of his plays Marlowe agrees with Shakespeare about as well as Shakespeare agrees 
with himself’ (Fig. 5). 

Finally, Mendenhall pointed out that a dramatic composition Armada Days written by 
Prof. Shaler of Harvard, in which the author endeavoured to compose in the spirit and the 
style of the ‘ Elizabethan days’, gave a curve (from only about 20,000 words) with ‘excess of 
the 4-letter word and in other respects decidedly Shakespearian’. Mendenhall does not 
give this curve or any figures. 

Discussion 


We are not concerned here so much with the results that Mendenhall obtained, or with 
their repercussions, but rather with the general value of the technique. There appears to 
be little doubt that he was the first to act on the suggestion of de Morgan, and that his own 
method of using the frequency distribution, instead of merely the average length of word, 
was a distinct improvement, although the average length would not normally be given 

















C. B. WILLIAMS 253 


to-day without the standard deviation. The skew form of the curve makes this latter measure 
less reliable than it might otherwise be. 

Mendenhall’s sampling method was to take blocks of 1000 words each ‘at the beginning 
of the volume and, after a few thousand words had been counted, the book was opened near 
the middle and the count continued’. This method is not above reproach, but, in view of the 
large number of samples and the general close resemblances, it is unlikely that a more 
randomized method would produce any measurably different result. In the case of the 
plays of Shakespeare the sampling was large enough to justify the statement that it included 
nearly all the most famous plays ‘in whole or in part’. 

That Mendenhall appreciated the difference between the statistical method and evidence 
based on selected phraseology believed to be characteristic is clear from the following 
quotations: ‘The chief merit of the method consisted in the fact that its application required 
no exercise of judgement’ and that ‘characteristics might be revealed which the author 
could make no attempt to conceal, being himself unaware of their existence’; and again, 
‘the conclusions reached through its use would be independent of personal bias, the work 
of one person in the study of an author being at once comparable with the work of any 
other’. 

That Mendenhall saw the wide range of possibilities is clear from his statement: ‘it is 
hardly necessary to say that the method is not necessarily confined to the analysis of a com- 
position by means of word-length: it may equally be applied to the study of syllables, of 
words in sentences, and in various ways.’ And I have already quoted his suggestion as to 
its value in comparative studies. 

Two additional comments may be of interest. The curve of the frequency distribution of 
words of different lengths is in every case skew, with the peak usually at 3 or 4 letters per 
word, and the tail running off generally to 15 or 16, but sometimes to higher than this. 
In my contribution tothe study already mentioned, I showed that by the use of a logarithmic 
scale the skew distributions of sentence length became approximately symmetrical, and 
so the distribution resembled a log-normal. It is of interest to see if Mendenhall’s figures 
for word length show a similar relation. We can, however, note beforehand that the length 
of a sentence is under the conscious control of a writer, who may stop when he pleases. The 
lengths of words are not so controlled and selection of words for reason of their length alone 
is not likely to occur. 

Most unfortunately only three sets of numbers are given by Mendenhall, all in his first 
1887 paper. They are for 1000 words in Oliver Twist and two sets of the same size for Vanity 
Fair. Taking the latter we find that the accumulated totals up to each successive number of 
letters per word, expressed as percentages of the whole, are as shown in Table 1. 

When these results are plotted on to log-probability paper the result is as shown in Fig. 6. 
There is an approximately straight-line relation up to about 8 letters per word, but above 
that there is a definite departure. The straight-line portion suggests a log-normal distribu- 
tion with a mean log at 0-53 and a standard deviation of approximately 0-26. On an arith- 
metic scale this is equivalent to a geometric mean of about 3-4 and a standard deviation of 

x or +1-8.* The arithmetic mean is 4-5 letters per word. 

* When a frequency distribution is skew on an arithmetic grouping of the data but approxi- 


mately symmetrical when a geometric scale is used, the standard deviation cannot be expressed on 


an arithmetic scale as ‘+ or —’. The use of the expression ‘3-4, x or + by 1-8’ implies that 
approximately 33% of the observations will be between 3-4 and 3-4 x 1:8; 33% will be between 3-4 


and 3-4+ 1.8; and approximately 17% will be above, and below, these limits. 








254. Studies in the history of probability and statistics. IV 


In his second paper Mendenhall gives five graphs showing Shakespeare’s frequency dis- 
tribution per 1900 words in comparison with other authors. With a lens and a fine scale it is 
possible to read the numbers to about three units, but unfortunately the results so obtained 
from the five diagrams do not agree. This is possibly because (as he admits in one case) 
Mendenhall exaggerated the differences in the diagrams in order to separate the two lines. 
I have made an estimate from each of the five diagrams, and the average values are given in 
Table 2. 
































Table 1 
No. of letters No. of words Accumulated | Accumulated 
per word out of 2000 | no. of words total total % of 2000 | 
| 
1 58 | 58 2-9 
2 315 373 18-7 
3 480 853 | 42-7 
4 351 1204 | 60-2 
5 244 1448 72-4 
6 154 1602 80-1 
7 152 | 1754 | 87°7 
8 100 | 1854 | 92-7 
9 63 | 1911 | 98-9 
10 43 | 1960 | 98-0 
11 16 1976 98-8 
12 15 | 1991 | 99-6 
13 4 | 1995 99-8 
14 5 | 2000 | 100-0 | 
Table 2 
Letters Words Letters Words Letters Words 
| 
1 47-6 6 71-2 ll 3-4 
2 175-8 7 52-6 12 2-0 
3 225-0 8 31-6 13 1-0 
4 237-6 9 18-4 14 0-4 
5 124-4 10 9-0 — — 























When the accumulated totals are plotted on log-probability paper as in the previous case, 
the result (Fig. 6) indicates a fairly regular departure from the straight line, although once 
again the break is more distinct above 7 letters per word. 

I have also attempted to get, from Mendenhall’s diagram of five samples of 1000 words 
each from Oliver Twist, some measure of the error of his results. 

The frequency distribution of words of certain lengths in the five samples is approxi- 
mately as given in Table 3. 

If the size of the sample were increased sixteen times—to 80,000 words—without altering 
the pattern of the material sampled, the s.z. of the mean would be reduced to a quarter of 
the above—or approximately 2-1, 1-1, 2-1, 1-5 and 0-8 for words of 3-6 letters respectively. 
The error is smaller in the less frequent words, but greater in proportion to the mean. 








-_ 


ESE ll 


ie ade 





— — QD 


et VS 








C. B. WILLIAMS 255 


Thus for the comparison of two samples of this size—assuming the same order of variation 
in each—the s.&. of the difference would be approximately 1-4 times the above or 2-9, 1-7, 
2-1 and 1-1. Thus differences in number of words per 1000 would have to be of the order of 
5-7, 3-3, 4-1 and 2-1 to be significant at the 1 in 20 level, and 7-5, 4-4, 5-4 and 2-8 to be signi- 
ficant at the 1 in 100 level. The five samples on which the above rough estimate is made 
were, however, consecutive samples of 1000 words from one work; when different works, 
written at different periods by the same author, are combined the error of the mean would 
almost certainly be greater. 

A careful examination of Mendenhall’s diagram giving the comparison of the distribu- 
tions of Shakespeare and Marlowe suggests measurable differences only in words up to 
5 letters, Marlowe differing from Shakespeare approximately as follows: 1 letter, 5 less; 
2 letters, 3 less; 3 letters, 3 more; 4 letters, no difference; and 5 letters, 5 more. All the other 

















Table 3 
d | §.E. of mean 
No. of letters Five samples Mean | for 5000 words 
| 
3 221, 232, 236, 254, 268 242-2 8-36 
4 170, 175, 183, 186, 198 182-4 4-82 
5 95, 102, 120, 122, 123 112-4 5-80 
6 83, 92, 94, 97, 103 93-8 | 3°28 








word lengths are indistinguishable in the diagram. These differences may have been 
exaggerated in the diagram. The numbers are in words per 1000 in large samples—in the 
case of Shakespeare over 200,000 words, but in the case of Marlowe the size of samples is 
not given. On the other hand, the comparison of Bacon with Shakespeare (see Fig. 5) 
shows a difference of nearly 60 words per thousand with 4-letter words. 

It would seem likely that real differences between authors would show themselves as 
sequences of departures in the same direction for several consecutive word lengths. One 
can imagine one author differing from another in an unconscious preference for longer 
words or for shorter words, but it is unlikely that one author would prefer words of, say, 11 
letters in preference to 12, while another author would prefer the 12 to the 11. Thus, rapid 
changes of departure directions in sequence would be less convincing than blocks of depar- 
tures of similar sign as evidence of real differences, and would be more likely to be due to 
error. 

Mendenhall, in his 1887 paper, calls attention to the fact that in an analysis of Dickens’s 
Christmas Carol, words of 7 letters appeared to be unduly numerous, due to the fact that 
the character ‘Scrooge’, frequently referred to, isa word of this length. It would be desirable 
to leave names of persons and places out of any tabulation. 


BIOGRAPHICAL NOTE 
Thomas Corwin Mendenhall was born in Ohio on 4 October 1841 and died there on 22 March 
1924. He was the descendant of a Benjamin Mendenhall who emigrated from England 
(probably from Wiltshire) in 1686 to join Penn’s Colony and who settled at Concord, 











256 Studies in the history of probability and statistics. IV 


Pennsylvania. T. C. Mendenhall spent some early years as a school teacher, but in 1873 
became the first Professor of Physics and Mechanics at the newly founded Ohio Agri- 
cultural and Mechanical College. He was Professor of Physics in the University of Tokyo 
from 1878 to 1881 and in the Ohio State University from 1881 to 1886. He then became 
President of the Rosa Polytechnic Institute in Indiana and was elected the following year 
to the National Academy of Science. After a few years as Superintendent of the U.S.A. 
Coast and Geodetic Survey he became President of the Worcester Polytechnic Institute, 
where he remained until his retirement at the age of 60. 

His biography by Henry Crow (Biograph. Mem. Nat. Acad. Sci., Wash., 16, 331-51), to 
which I am indebted for the above information, lists about sixty publications in physics, 
particularly geophysics, units of electrical measurement, state boundary lines in the U.S.A. 
and many other related subjects. The first of the two papers at present under discussion is 
listed, but not the second. There is no mention of his interest in the statistics of literary 
style, but it is said that he lett in MSS. about 900 pages of an autobiography which has 


never been published. If sti” . it might repay study. 
.FERENCES 
MENDENHALL, T. C. (1887). The characteristic curves of composition. Science, 9 (214, supplement), 
237-49. 


MENDENHALL, T. C. (1901). A mechanical solution of a literary problem. Pop. Sci. Mon. 9, 97-105 

De Moreay, A. (1872). A Budget of Paradoxes. London (2nd edition 1915). 

Witu1aMs, C. B. (1940). A note on the statistical analyses of sentence length as a criterion of literary 
style. Biometrika, 31, 356-61. 

Witiiams, C. B. (1952). Statistics as an aid to literary studies. Penguin Science News, no. 24, 
pp- 99-106. 

Yue, G. U. (1939). On sentence-length as a statistical characteristic of style in prose. Biometrika, 
30, 363-84. 




















[ 257 ] 


A GOODNESS OF FIT TEST FOR SPECTRAL DISTRIBUTION 
FUNCTIONS OF STATIONARY TIME SERIES WITH 
NORMAL RESIDUALS 


By A. M. WALKER 
Design and Analysis of Scientific Experiment, 6 Keble Road, Oxford 


1. InrRopUCTION 


In recent years problems of statistical inference associated with the spectral analysis of 
stationary time series have been studied by a number of authors (see, for example, Bartlett, 
1950, 1954; Grenander, 1951; Grenander & Rosenblatt, 1953; Whittle, 1951, 1954). In 
most of these problems the aim has been to obtain information about the spectral density 
function of the series, which is usually assumed to be absolutely continuous. Grenander & 
Rosenblatt (1953) give asymptotic confidence bands for the spectral distribution function, 
but still require the assumption of an absolutely continuous spectral density function. 
The present paper adopts an approach which is associated directly with the spectral 
distribution function, and yields a general large sample goodness of fit test of the hypothesis 
that a stationary Normal series has a spe-ified spectral distribution function which is not 
necessarily absolutely continuous but may have certain points of discontinuity. 


2. THE BASIC TEST STATISTICS AND THEIR MAIN PROPERTIES 


Let {X(t)} (§ = 0, +1, +2, ...) be a time series which is stationary to the second order. We 
may assume without loss of generality that X(t) is measured from its mean, i.e. that 
E{[X(t)] = 0. 

Then it is well known that we can write 


X(t) = | *_etedd(w) (1) 


(the integral being interpreted in the mean square sense), where Z(w) is a complex-valued 
orthogonal process defined over (—7,7) with} 
E{| ZW) — Z(w4) |?] = F(w,)—F(o,), 
E{{Z(w,4) — Z(5)} {Z(@2) — Z(,)}*] = 0, (2) 
when the intervals (@,, W,), (Ws, @4) do not overlap, 


F(w) being the spectral distribution function (not necessarily normalized to unity). F(w) is 
non-negative and non-decreasing over (— 77, 77) and is related to the autocovariance function 
o*p(r) = EL X(t) X(t+7)] by the equation 


ap(r) =|" 


eT dF (w). (3) 


+ The symbol * denotes complex conjugate. 
17 Biom. 43 











258 Goodness of fit test for spectral distribution functions 


The inversion formula corresponding to (1) is, with Z(0) = 0, 
2 . . 
Bw) = 5 Y [(\-e")/(ir)] X(r), (4) 


for every continuity point w of F and therefore of Z. Here the infinite sum is to be inter- 


preted as a mean-square limit lim S and (1 —e-‘*)/(ir) is to be replaced by w when r = 0. 
n>o r=—n 


This suffices to determine Z(w) uniquely for all w in (—7, 7), since at points of discontinuity 
it can always be defined to be continuous on the right (corresponding to F(w) being con- 
tinuous on the right). 

Let A(w) and B(w) be the real and imaginary parts of Z(w). Then we have 


E[A(w,) — A(w)P = E[B(w,) — B(w,)P = 3[P (og) — Fo), (5) 


E{{A(o,) — A(o3)} {A(@2) — A(,)}] = E[{B(o,) — B(os)} {B(@2) — B(o,)}] = | (6) 
when (W,, @), (3,4) do not overlap, ; 


and E{{A(w,)—A (w3)} {B(w,) — Biw,)}] = | (7) 


for any intervals (w,, W2), (3, W4). 


These results do not follow directly from (2), but can easily be established using (4) (cf. also 
Doob, 1952, p. 482). 

Suppose now that we are given a set of observations at 2n+1 consecutive times, say 
X(—n), X(—n+1), ..., X(0), X(1), ..., X(m). Then we can approximate to A(w) and B(w) 
by the finite sums 








An(0) = 3 3% (=) x0, (3) 
pa By (w) = i 3 (= “) X(r). (9) 


If nis fairly large, we should expect that (5), (6) and (7) will remain approximately true when 
A,,,(w) and B,,,(w) are substituted for A(w) and B(w), provided that w is a continuity point 
of F. This suggests that quantities of the form A,,(w,)— Ag,,,(@,) or By,(w2) —B,,,(w,) may 
be used to construct tests of the goodness of fit of the observations to the spectral distribu- 
tion function F(w). 

We therefore take such quantities as our basic test statistics. Since B,,, is an even func- 
tion of w and A,,, an odd function we need consider only non-negative values of w; in view 
of this it is convenient to express our results in terms of the spectral distribution function 
F,(w) defined over (0,77), which is such that 


d,, F.(w) = 2d_,F(—w) = 2d, F(w) (0<w<m) and o%p(r) = |" cos wrdF,(o). (10) 
0 


The main properties of these statistics, corresponding to equations (5), (6) and (7), are 
as follows: 


(i) Let (w,,w,) be a continuity interval for F,(w), i.e. an interval whose end-points are 
continuity points of F,(w), with w,>7/n and 7—w,>z7/n, and let n be so large that 


F.(o,+An|[n)—F.(w,—An|n) (i= 1,2), where A= O(1), 








i 





4) 


SO 


ure 


re 








A. M. WALKER 259 
is small compared with F, (w,)— F,(w,). Then to a good approximation 
E{A2,,(W2) — Agn(y)? = EL Ben (2) — Bon()? = UF (%2) — F(0,)]- (11) 


Also if w,>a7/n and F,(w,+Am/n)—F.(@,—An/n), F,(Am/n) are small compared with 
F,(.), we may put w,=0 in (11) provided that F, has no discontinuity at o=0. A similar 
result holds with w,=7, provided that F, has no discontinuity at w=7. 

(ii) Let (w,,,) and (w 3, @,) be continuity intervals for F,(w), and either o,—w,>7/n or 
W3—W, = O(m/n) and F(4(w,+ 5) + Am/n) — F(4(Wg+ 0 3)—Am/n), with A=O(1), be small 
compared with F(w,) — F(w,), F(w,) — F(w,). Then to a good approximation 

E{{A2,,(4) - A,,,(3)} {Az,,(2) 7. A,,(0,)}] = i 
El{ Bon (4) — Bon(@s)} {Ban(@2) — Be,,(,)}] = 0. 


(iii) For any intervals (w,, w,), (3, 4) 
E{{Ao,(4) ; Fe Az,(3)} {Bon (2) re B,,(0,)}] = 0. (13) 


This last result, which is exact, follows at once from the fact that in 





2 r 8 


id (= 1W3 — COS me) (= SW. — SIN SW), 
r,s=—Nn 


) ox p(r—8), 


the terms with (r,s) = (79, 8)) and (r,s) = (—79, 89) cancel, there being no contribution from 
(r,s) = (0, 0). The derivation of (i) and (ii) is more complicated, and certain difficulties arise 
in assessing the accuracy of the approximations (it will be noted that the conditions under 
which this may be expected to be high have not been stated at all precisely). The discussion 
of these at this point would mean a fairly lengthy digression from the main argument of the 
paper, and is therefore postponed until the concluding section (§ 6). 

Now let {X(t)} be a Normal series. Then A,,,(w), B,,,() are Normal random variables, and 
hence, from (11) and (13), 


[Az,,(2) a A2,(@,)? + [Ban (2) ay B,,(o,)P 
is, under the conditions stated in (i) above, distributed approximately as 
HF, (2) a4 F,(w,)} Xe)» 


Xm denoting a random variable distributed in the standard x? form with m degrees of 
freedom. Also, using (12), it follows that if 0 = wy < w,<,...< @, = 7 is a subdivision of the 
interval (0,77) such that wp», ,...,@, are continuity points of F,(w), and the ‘power’ 
(incrementin F, )ineach interval (w,,,,,)islarge compared with that in (w; — Am/n,w; + Am/n) 
and (w,,,—Am/n, w;,,+Am/n), A = 0(1) (¢ = 0,1,...,—1), then 


[Aon (41) — Aon(s)}? + [Bon(@i11) — Bon(O)P (¢ = 0,1,..., k—1) (14) 


are approximately distributed as }{F, (w;,,) — F’,(w,)} X% independently of one another. 
It should be noted that 
(i) if HLX(t)] is not specified a priori we may replace X(r) in (8) and (9) by X(r) — X, where 
X is the mean of the set of observations. This introduces an additional term in (8) which is 
of order n-4, and hence for sufficiently large n does not affect the approximations (11) 
and (12); 


+ When 0 or 7 are not continuity points of F',(w) we take w)>0 or wo, <7. 











260 Goodness. of fit test for spectral distribution functions 


(ii) the assumption that the number of observations is odd is not important since with an 
even number, 2n + 2 say, we can, in defining A,,(w) and B,,,(w), either omit one of the end 
observations, the loss of information being asymptotically negligible, or make the limits 
of summation —n, n+ 1, which gives an additional term of order n-. 

It does not seem possible to make any general statement about the distributions of A,,,(w) 
and B,,,(w), and hence those of the statistics (14), without the assumption that {X(f)} is 
Normal. We might, for example, wish to consider the case of the general discrete linear pro- 


cess, defined by X(t) = > g(t — u) e(u), the e(w) being mutually independent with a common 


distribution, but the limiting distributions of A,,,(w), B,,(w) as noo will clearly depend 
on the form of this distribution, e.g. for the independent process {X(t)} = {e(t)}, the mth 


cumulant of A,,,(w)—> [um +2 5 =)" Km(€)- 


r=1 





3. THE GOODNESS OF FIT CRITERIA 


Let the null hypothesis be that {X(¢)} has a specified spectral distribution function F,,(w). 
Then we can construct statistics 


T; = 4[{Aon(@ji41) — Aan()}* + {Ban(i41) — Bon(O) PIF (O41) — F()} (¢ = 0,1, ...,4-1), 
(15) 
which for sufficiently large n will, to a good approximation, be distributed independently as 
multiples of x, these multiples being unity on the null hypothesis, and either greater or 
less than unity on the alternative hypotheses. Equivalently s? = 47, will be approximately 
independent variance estimates each with f;=2 degrees of freedom, the corresponding 
variances o? being equal to unity on the null hypothesis and having arbitrary values on the 
alternative hypotheses. From this property we can easily derive suitable criteria for testing 
goodness of fit. 
The most obvious one to use is the standard likelihood ratio criterion 


k-1 
= ~2[L—Lmax] = & filet 1—log st) 
= 5 (Z,—2log 7) + 2k(log 2 — 1). (16) 
tan 

On the null hypothesis we have M ~ () xj), using the approximation due to Bartlett (1937) 
(which should be sufficiently accurate except perhaps for small k, when it may be advisable 
to use the better approximation, due to Box (1949), based on the standard F distribution). 
For large k it is worth while modifying the x? approximation to M ~ 1-118¥4.932,), which, 
if we neglect the effect of the approximations of § 2, reproduces the mean and variance of 
M exactly, «,(J7) with an error of about 3%, and x,(M/) with an error of about 8 % (this 
is easily verified from the expression k{logT (1 —2¢)—2¢—(1-—2¢)log(1—2¢)} for the 
cumulant generating function of M, log H(exp M¢)). 

An alternative criterion is obtained by assessing the significance of each 7; separately, 
using a two-tailed test, and then combining the results by Fisher’s method (see, for example, 
Fisher, 1941, p. 95). Thus if 


p,=exp—}3T7;, welet y,;=1—p; Hee 
Pi (pi <4) 

















A. M. WALKER 261 


k-1 
and refer U = § u;, where u; = — 2 log 2y;, to a y* distribution with 2k degrees of freedom, 
i=0 


high values being considered significant. With this procedure there is the advantage that 
from the individual values of the p;, the intervals (w,, ;,,) for which there are highly signi- 
ficant departures of the numerators of (15) from their expectations on the null hypothesis 
are immediately identifiable. However, the form of the power function of the test may be 
unsatisfactory. This is indicated by its behaviour when k is large (but still of course <n). 
For it can be shown, using the Normal approximation to the distribution of U, that for 
alternative hypotheses with e; = (1/0?) — 1 of order k-+, the power function is then approxi- 


mately equal to 1— O(E, +0-420), (17) 
where (x) -{" e-tv* dy/,/(27), &, is such that ®(£,) = 1—a, « being the significance 


k-1 
level of the test, and 0 = k++ ¥ ¢; (see Appendix, § 1). Clearly the test is ineffective for such 
i=0 


hypotheses with @ > 0, since then (17) <a. 

This difficulty is overcome by modifying the U test as follows. Let U, = &,u;, U, = X,u,, 
where =, denotes summation over the k, values of i for which p; > 4, and X, summation over 
the remaining k, = k—k, values for which p,; <4, and refer U,, U, to x* distributions with 
2k, and 2k, degrees of freedom respectively, rejecting the null hypothesis if either is signi- 
ficantly large. This modified criterion is, for large k, sensitive to deviations ¢; of order k-* 
giving positive or negative values of 0 (Appendix, §2(9)). The comparison of the power 
functions of the modified U test and the M (likelihood ratio) test is in general a difficult 
problem, but the fact that, for large k, M is sensitive to deviations ¢; of order k-t (Appendix, 
§3(12)), supports the view that the modified U test is, on the whole, the more powerful. 

There are of course other criteria which are more effective for particular types of alter- 
native hypothesis. Among these may be mentioned max 7;, the largest of the 7;, which 
would be appropriate for alternative hypotheses under which a large jump in the spectral 
distribution function (or a large peak in the spectral density function, its derivative) 
occurred at some point. The distribution of max 7; under the null hypothesis is obviously 
es Pr {max 7, >a} = 1—(1—e-#)*. 

These tests can also be applied when, as is usually the case, F’, (w) isnot completely specified 
under the null hypothesis, but contains a number of parameters which have to be estimated 
from the data, provided that consistent estimators of these parameters are available. For 
then the effect of the substitution of the estimators for the parameters occurring in the 
denominators of the 7; is negligible when n is sufficiently large. For example, the variance of 
the series, 7%, will seldom be given a priori, at most the normalized spectral distribution 
function F,(w)/F,(77) being specified, and we can usually substitute the estimator 


e = 3 (X(r)-X}/(2n), 


which is certainly consistent, having a mean-square error of order n-', when %, | p(7) | 
converges. Again, if on the null hypothesis {X(¢)} is a linear autoregressive process, defined 
by the stationary solution of 


X(t)+a,X(t—1)+...+a,X(t—p) = Y(d), 











262 Goodness of fit test for spectral distribution functions 


where the Y(t) are independent Normal variables with zero mean and constant variance, 
and the roots of 2? +a,z?-1+...+a, = 0 have moduli less than unity, we can substitute the 
usual least squares estimates of the parameters a, ... a). 

With the likelihood ratio test we can, in fact, avoid the estimation of 0% by taking the null 
hypothesis to be the equality of the o?, their common value being unspecified. M then 
becomes Bartlett’s criterion for testing the homogeneity of a set of variance estimates, 


Again the test of the significance of the largest 7; can be made independent of o% by taking 


k-1 
the criterion to be g = max 7; / > 7;, whose distribution on the null hypothesis, given by 


i=0 
{1/x] k 
Pr(g>x) = = (- y() (l—rx)-, 


[1/x] denoting the integral part of x, was first obtained by Fisher (1929). 

It is worth noting that if the spectral density function f,(w) exists and is continuous, 
then when the difference between its upper and lower bounds in (w;, @;,,) is sufficiently 
small, we may, in calculating 7,, replace F, (0,1) — F,(w,) by (441-0) Hf. (@i42) +f (04))-4 
The advantage of doing so lies in the fact that the process is then usually of the generalized 
autoregressive form for which 


X(t) +a,X(t—1)+...+a,X(t—p) = Y(t)+b, Y(¢—1)+...+6, Y(t—q), 
giving immediately 
f(w) = (o%/m) | (1+ b,c +... +b,e%)/(l+a,e+... +a, 0?) |?; 
although, since f, (w) is a rational function of cos w, F,(w) can be obtained explicitly in terms 


of elementary functions, it may be quite a complicated expression. 


4. COMPARISON WITH TESTS BASED ON PERIODOGRAM INTENSITIES 


The periodogram intensity for angular frequency w is given by 
Don41(@) - Ap, .4(@) HF, 41(), 


2 a 
where Hg, .4() = lari dy er Xr). 


r=—n 


(18) 


It has been shown that when the spectral density f,(w) exists and is continuous near 
w, (t = 1,2,...,k—1), the statistics J,,, , ,(w,;)/7f,(w;) have the same distributional properties 
as the 7;, i.e. for large n are, to a good approximation, distributed as yj, provided that 
;,1;—-0,;>7/n and some suitable condition is imposed on the autocorrelations p(7)—the 
convergence of =, | p(7) | is certainly sufficient (see, for example, Bartlett, 1950, p. 4). It 
might be thought that there would be a direct connexion between 7; and J,,,,.,(@,), but this 
is not so since 


{Agn (2) — Agn(4)}? + {Ban (2) — Bg, (W,)}? = 277(2n + 1) 





Ws 2 
| Hyy,(0) de) ° 


t+ Our requirement on the spacing of the w; for the results (11) and (12) to hold to a good approxima- 
tion then becomes w,,,—0;>7/n. 











a 


fu — Sf 








A. M. WALKER 263 


Goodness of fit tests using these statistics may thus be obtained in the same way as those 
of §3. Such tests have been proposed by several authors; for example, the likelihood ratio 
test is discussed by Bartlett (1950, pp. 7-9), and the Fisher g test by Whittle (1952a, p. 47) 
and Sargan (1953). Their application need not be confined to the case of null hypotheses 
for which the spectral distribution function is absolutely continuous. For we can write 
F.(w) = FY(o) + F?(v), where Fw) is absolutely continuous and F®(w) is a step function 
with a finite or enumerably infinite set of discontinuity points (cf. Bartlett, 1955, p. 163), 
and ip can always choose the w, so that f,(w,;) exists (in practice this will mean that 
f..(@;) is also continuous near w,). 

When f,(@) is a sufficiently smooth function the power of one of these periodogram inten- 
sity tests should be approximately equal to that of the corresponding test based on the T, 
with intervals (w;,@;,,) such that the periodogram frequencies are }(w;+.©;,,). For let 
the spectral density be f(w) on the null hypothesis and f®(w) +f®(w) on an alternative 
hypothesis. Then when the variation of f, f® with w is approximately linear in (w;, @;,,), 
the expectations of the variance estimates s? on the alternative hypothesis become approxi- 


mately equal to 1+ [po( Petes) | @(*t Pe) in both cases, and these determine the 


power functions. The tests of §3 should be better for detecting large peaks in f, (w) of unknown 
location. For with the intensity tests, such peaks will have appreciable probabilities of being 
detected only when they occur at frequencies whose differences from some w; are of 
order n-1. 

However, when f(w) is continuous for all w, intensity tests of greatly increased power 
may be obtained by using frequencies w; = 27j/(2n + 1) (j = 1, 2, ...,) (ef. Whittle, 1952a,; 
Sargan, 1953). For then the intensities J,, ,,(w;) and J,,,,(@;) (j’+j) are asymptotically 


uncorrelated, at least when (as will usually be the case) we can write X(t) = ¥ g(u)e(¢—w), 
u=0 


where the e(¢) are independent Normal variables with zero mean and constant variance 
o?, and g(u)—>0 exponentially as woo; this follows most easily from the asymptotic 


relation , 
Ix)() ~ (af (0)/02) Lolo) (19) 


between the intensities for {X(¢)} and {e(t)} (see Bartlett, 1955, p. 279). It is reasonable to 
expect that (as assumed by Whittle and Sargan) the asymptotic distributions of the 
criteria will not be affected by the fact that there are now n intensities instead of a finite 
number k; this is easily demonstrated for the M and U tests, although it seems difficult to 
construct a satisfactory argument for the g test. We may also note that, as was pointed out 
by Sargan (1953, p. 148), the intensity tests can be used when {X(é)} is a discrete linear 


process. This follows at once, since with X(t)= > g(t—u)e(u) we have the relation (19), 


u=-—o 
n n 
and by the central limit theorem, the distributions of n-* ¥ e(r)coswrandn-* > e(r)sinwr 
r=—n r=—n 
tend to the Normal form with zero mean and variance unity, at least when the third absolute 
moment £ | e(t) |° is finite. 
We could alternatively consider, instead of individual intensities, integrals of the form 





K,,(@4, 2) -|" Ton4q(0) dw = 2 (1 — s| ) C,(sin w,8 — sin w,8)/8, (20) 
s=—2n n+ 











264 Goodness of fit test for spectral distribution functions 


2n+1—|s| 
where C, = ‘ , *X(r) X(r+|s|)/(2n+1—|s|). Then for continuous f,(w) we have 


r=1 
lim E{K,,(@, @2)} = 2m |" F,(0) dw, (21) 
n—>o On 
lim n var K,,(@1, @2) = |” (nf. (w)} dw (22) 
n—>o at 
and lim n cov {K,,(@,, 2), K,,(W3, @4)} = 0, (23) 
n> 


when (@,,,) and (w3,@,) do not overlap (see Grenander, 1951, pp. 521-3; Grenander & 
Rosenblatt, 1953, pp. 541-3). It can also be shown that under wide conditions (any which 
ensure the asymptotic normality of a finite set of serial covariances C, will suffice) the 
distribution of a finite set of quantities such as n-4[K,,(w,, 2) — E{K,,(W,, @,)}] tends, as 
n—> 00, to the Normal form with zero means and finite covariance matrix. Hence, provided 
that 


c 1 
E{K,, (0%, %41)} — J 7 f,{w)dw = o(n-4) as no, 
wf 


which is certainly true if f,(w) has a bounded derivative (Grenander & Rosenblatt, 1953, 
Theorem 2), the statistics 


v, = nt [Kul W134) —- anf" f.(0) aw] [| [ent coy? do| : (24) 


where the intervals (w,,@,;,,) do not overlap, are for large n distributed approximately as 
independent Normal variables with zero mean and variance unity. 


This enables us to construct goodness of fit tests based on the v;; for example, we might 
k-1 


use the criterion >) v?, rejecting the null hypothesis if this is significantly large when referred 
i=0 


to a x? distribution with k degrees of freedom. Such tests should be much more powerful 
than those of §3 because the asymptotic variance of K,, is not finite but proportional to 
n-1, However, there is the disadvantage that when the null hypothesis does not specify 
f,(w) completely, the substitution of estimators for the unknown parameters may affect 
the asymptotic distribution of the test criteria. For the root-mean-square errors of these 
estimators will usually be of order n-+, and an error of n-? in f,(w) produces an error in v; 
which may be of order unity. 


5. NUMERICAL ILLUSTRATION 


To illustrate the methods described in §3, these were applied to test the goodness of fit 


of a set of observations from a particular series {X(t)}, a third-order linear autoregressive 
process with 


X(t+2)—1-1X(t+1)+0-5X(t) = Y(t), Y(t+1)—0-1Y(t) = c(t), (25) 


the e(t) being independent Normal variables with zero mean and variance unity. The set 
consisted of 385 consecutive terms out of a group of 500 given by M. G. Kendall (1949, 
Series 16), n thus being 192. The end-points of the intervals on the w axis were taken to be 
w; = gyi7 (i = 0,1, ..., 24) so that each interval was of length j,7 = 87/n; this might be 
thought rather small, but an examination of the exact expressions for the variances and 








co 
in 











A. M. WALKER 265 


covariances of the quantities A,,,(@;,1) —Aen(;), Bon(@j41) — Be, (w;) along the lines of §6 
indicated that the y? approximations to the distributions of the 7; should be fairly accurate. 
From (25), we have 


f.(w) = 1/[a | (1—1-le + 0-5e?”) (1 — 0-1e®) |?] 
= 1/[7(2 cos? w — 3-3 cos w + 1-46) (1-01 — 0-20 cos w)]. (26) 


The graph of f,(w) plotted against w is shown in Fig. 1; it has a single fairly sharp peak at 
a value of w approximately equal to 47. Also on integrating (26) we find 


2-6(a + 0°34185)2 + 0-09617 
2-6(a — 0-34185)2 + 0-09617 
0-38465a 


. eo 
+ 3-33039 tant 3, (27) 





11 
nF, (w) = 0-05643 tan (=) + 0-79173 log 


where x = tan (4w). 


12:0 |— 


80 [> 





l l . = 
0 40 8-0 12:0 16:0 20-0 24:0 
2409/7 ——> 








Fig. 1. Spectral density function for Kendall’s Series 16. 


The values of m[F,(w;,,)—F,(@,)] for « = 0,1,...,23 are given in Table 1. The corre- 


sponding values of 
P, = 4{A gn (W441) — Aon() PLP, (@:41) — F(o)}, 


Q: = 4{Ban (O41) — Bon (O)P{F(0;41) — F.(oy)}, 
and T, = P;+Q; are given in Table 2. 
The likelihood ratio criterion M = 5 {T, — 2 log T,} + 48(log 2—1) = 24-34, which is 
clearly not significantly large when ane a ¥? distribution with 24 degrees of freedom, 


so that with the M test we should draw the correct conclusion that the observations are 
consistent with the spectral distribution function of the series being the specified F',(w). 








266 Goodness of fit test for spectral distribution functions 


If the variance of ¢(t) had not been specified, we would have used the homogeneity of vari- 
ances criterion M’ = 23-67 with 23 degrees of freedom, which again is not significantly 
large (the difference M — M’, with 1 degree of freedom, which gives a test for 0? = 1 against 


alternatives o? + 1, is also not significant). 





Table 1. Distribution of spectral ‘power’ over intervals of length 37 
for Kendall’s Series 16 








| 
| 
| ‘ mF (ga(i+ 1) m) — F ,(zzi7)] a mF (sa(i+ 1) m1) — F ,(sqin)] 
| | 
hk. Oa 1-0220 12 0-0763 
Nay 1-0953 13 0-0576 
2 | 12440 14 0-0448 
3 | 1-4421 15 | 0-0361 
4 | 1:5558 16 0-0299 
5 | 1-3799 17 0-0255 
6 | 0-9734 1s | 0-0223 
7 | 0-6029 19 | 0-0199 
8 | 0-3668 20 | 0-0183 
9 | 0-2309 21 | 0-0171 
10 | 0°1523 22 0-0164 
11 | 0:1056 = i 0-0161 
| 
os —_ —E ——— ES } — 
| Total 10-5513 























Table 2. Values of test statistics for Kendall’s Series 16 





























| | 
| i P; | Q: T; v | P; | Q: | T; 
| | | 
J a 
| Oo 0-1949 | 2-2857 2-4806 12 00001 | 1-4326 | = 1-4327 
| a | 04505 | 0-6195 1-0700 13. | 0-0053 | 28015 | 28068 
ae 1-0072 | 2-8562 3-8634 14 | ©1729 | 0-0048 | 0-1777 
| 3 0-0134 | 00-0336 0-0470 15 | 17855 | 0-8365 | © 2-6220 
+ 0-1471 | 0-6799 0-8270 16 0-4132 | 0O-7761 | 1-1893 
5 0-8415 0-1389 0-9804 17 0-7310 0-0200 | 0-7510 
6 0-5004 4-8900 5-3904 18 0-5207 0-7617 1-2824 
7 0-0069 0-2930 0-2999 19 0-1436 4-|721 4-3157 
8 0-2438 | 2-5375 2-7813 20 0-0001 | 2-556 2-3557 
9 10291 | 04558 1-4849 21 00-0027 | 0-0886 0-0913 
10 1-4156 | 0-5018 1-9174 22 0-6150 | 0-8077 1-4227 
ll 0-1182 | 0-1355 0-2537 23 0-0400 | 0-4936 0-5336 
elite son SAREE dileiaaneine piensa | 
Total 10-3987 | 29-9782 40-3769 
| 
Again we have 
23 
U=-2> log2y,= DS T,-— 2D log(1—e-%i)—48log2 = 40-63, 
i=0 


Ti >2 log 2 


Ti<2log2 


V 





ly 
st 








A. M. WALKER 267 


with 48 degrees of freedom, and since there are 12 values of i with T; > 2 log 2, 


U= > 7,-—24log2 = 24-39, with 24 degrees of freedom, 
Ti>2log 2 
and U,= 3 log(1—e-47i)—24log2 = 16-24, with 24 degrees of freedom. 
Ti<2log2 


Neither U, U, nor U, is significantly large. 

As a further check, values of the corresponding criteria U, U,, U, for the sets {P;} and {Q;} 
of approximately independent variables each distributed as x4), were obtained. For the 
P set, U, = 41-:17(30p.¥.), U, = 7-:05(18p.F.), U = 48-22(48p.¥F.), and for the @ set, 
U, = 13-95 (14D.¥.), U, = 30-79 (34p.r.), U = 44-74(48p.¥.). The values for the Q set 
agree closely with their expectations, but for the P set, U, is greater than the upper 
10% point of xX), while U, is only just greater than the lower 1% point of yf»). 
This indicates that the values of P, tend to be too low, which is confirmed by the fact 


23 
that > P, = 10-40 is less than the lower 1 % point of x. This effect is somewhat surprising. 
i=0 


The exact expressions show that the values of H(P;) tend to be less than unity, but the 
negative bias does not seem to be large enough to provide a satisfactory explanation. 
A detailed examination of the accuracy of the y? approximations in this particular case 
might be of interest, but has not been carried out because of the extremely tedious calcula- 
tions required. There is of course always the possibility that the effect might be traced to 
some peculiarity in the series of residuals {e(#)} (compare the experience of Bartlett & 
Rajalakshman, 1953, p. 120). 


6. DERIVATION OF THE APPROXIMATIONS OF §2 


It is easy to show that 


4m? EAs, (12) — Aag(0,)]? = I ” [K?(u) — L(u)] dF, (u), (28) 
0 

4nE[ By, (0) — Byy(0,)}? = I "[K¥%(u) + 13(u)]aF, (x), (29) 
0 


4m? B{{ Aon (4) — Aan (s)} {Aan(2) — Aen(1)}] -|" [Ki (u) — Lie(u)) (Kau) — Leu) dF, (uw) 


and (30) 

47? E[{Bz,,(@4) — Bo, (s)} {Ban(@2) — Byy(,)}) -|" [KP (uw) + Li? (u)] [K3(w) + D**(u)\dF(u), 
(31) 
where K#(u) = K,,(u, 01, 02) = G,(u—0,) — G,,(u—,), (32) 
D?(u) = L,,(u, }; Wg) = G,(u € 0) 7 G,(u a Wg), (33) 

G,, being defined by 
Cingjal OTE. 

n(%) | 2sin dy (34) 


and K*¥(u) = K,,(u, 3,04), L*4(w) = L,,(u, Ws, 04). 











268 Goodness of fit test for spectral distribution functions 


For example, 
4B Asn ()— Aa (JP = & (= =— a") (= re 1") | "cos u(r —s)dF,(u) 
0 





r,8=—n r 8 
Pe a : , 2 
-| | py [er (ear — e— tar — ett + etm") /(2r)] dF, (u) 
0 |r=—n 





is I ‘ [G,(u + 5) — G,(u— 0.) — G,(u+ 0) + @,,(u—o,) dF, (u), 








since x (er —1)/r = iw+ 2i {* (= > cos yr) dy = if’ oe dy. (35) 
(29), (30) and (31) are obtained in the same way. 
= Si(n+4)u+TJ,(u), say, (36) 


where by the Riemann—Lebesgue theorem (see, for example, Titchmarsh, 1944, p. 403), 
I,(u) > 0 like n- as n> oo. In fact, on integrating the second term of (36) by parts, we find 
that for | u| <7, 


| [,(u) |< wey (1-3) <o37m. (37) 
When the argument of G,, exceeds 7, as can occur in (33), it is convenient to use the relation 
G,,(27 —u) = 7—G,,(u), which follows at once from (36), and to write 
G,(u) = m—Si(n + 4) (27—u)+I,(u), 
where J,,(w) still satisfies (37). Hence if 
kin'(u) = k,(w, 0, @2) = Si(n + $) (u—@,) — Si(n + $) (u— 2) 
B2(x) = 1,1, 04, 2.) = Si(n + }) (w+ 0,)—Si(n+}) (w+) for O<u<m—wy, 
a similar definition applying for u > 7 —w, if Si(n + 4) (u+,) is replaced by 
m—Si(n+4)(u+o,) when u>m-w,; (i = 1,2), 


k'2(u) and 1}?(u) will be, for moderately large n, good approximations to K1?(u) and L}2(u) 
respectively, and similarly k?*(w) and /?4(w) will be good approximations to K*¥(w) and L4#(u). 

Now Si(n+4)u may be replaced to a good approximation by 47 when u>Am/(n +4), 
A = O(1). In fact, since 











sin & 2 a a “ate a, . S8 E with |6,|<1, 
y u y y u eae . 


u>dAn|(n+}4), 


2 
we have | Si(n+4)u— bn|<3-+ =o 


so that with A = 2, for example, the error in the approximation is not more than about 10°. 
It follows that to a good approximation we can put 


H2(u)= 0, (O<u<m), provided that w,>z/n, 1—w,>7/n, (38) 
=0, u<@,—An/n, w>w,+An/n, 
and k}?(w) (for w,—@, > 2Am/n). (39) 
=7, W,+An[n<u<w,—An/n 











A. M. WALKER 269 
Hence (28) and (29) are both approximately equal to 


| " (k2(u)}2 dF, (u), 
0 


and therefore to 


2 watAnin 
n°LP. (oy Ar|n)— Fe, +Am]n)]+ & |" arya (w, 
t=1J wj—An|n 
from which the result (11) of § 2 follows under the stated conditions. 
Also when approximations similar to (38) and (39) can be used for /34(w) and k*4(w), (30) 


and (31) are approximately equal to [70 k*4(u) dF, (uv), and, using Schwarz’s inequality, 
0 




















ARGC LAO AOE NOLO LMC) SUA) 
where Via= | (ER WIPAR (Wy), Ta |" (etude (wy, 
so that for w,—w, > 2Am/n, 
[ROL AOLAG) Ealusa) (40) 
(40) will also hold for w;—, < 2Am/n if 
OL ACLAO IE (UA) 
which is certainly true when 
[earned (0) <P [Rt nyPaP, 0) Toe 


The results (12) of §2 then follow. 

The use of (11) and (12) when the end-point of an interval coincides with 0 or 7 can be 
justified by a similar argument. 

To make the above discussion rigorous we require expressions for upper bounds to the 
errors in the approximations. These are easily obtained, although their form is somewhat 
complicated. For example 


4n? E{A,,,(W2) 7 A,,,(0)}? -(" (k,,(u) poss L,(u) + 6,,(u))? dF, (u) (41) 


(dropping the superscripts from k and / without ambiguity), where from (37), 


| d,,(w)|<1-5/n, (O<u<z). 


Now (41) is equal to [ k? (u) dF,(u) + R® + R®, (42) 
‘ 0 
where p=" a4 uaF (u) +2" 8,(u) (y(n) — Le) AF, (0) (43) 
0 0 


and R® -|" 2 (uw) dF,(w) — 2" L,,(u) k,,(u) dF, (w). (44) 
0 0 











270 Goodness of fit test for spectral distribution functions 


_ | "(8 (u) dF, (u) =7(F,,(wv,) — F,(o,)} + RO + RY, 
0 
where 
@,—An/(n+4) 7 @,—An/(n+4) 
R® = (| +| ) 13a) dP, +| (k2(u) —m?) dF,(u) 
0 Wg+An/(n+4), @,+An/(n+4) 


on @_+An/(n+4) 
and R®= ({ +| ) k? (u) dF,(u) 
w@,—Am|(n+4) 


Wy 


+ ( | Wine othe i " ) (12 (u) — 722) dF,,(u) 


Oy @4—An]| (n+) 


rAn| (n+) 
=| W, (@q-- 04, u) {dF (w, + u) —dF,(w,—u)}, 
—An|(n+4) 
=k? (wo, +u)=k2(w,—u) (u<0), 
where Wale) stale ellie } 
=k2(w,+u)—-1 (u> 0). 


Hence the magnitude of the relative error in the approximation does not exceed 


4 
S| RPE, (0,)— £.(0,)}}. 


(45) 


(46) 


(47) 


(48) 


(49) 


Upper bounds for the first three remainder terms (43), (44) and (46) may be derived by 


using the inequality 


| Si(n +4) u—47| <a +5 =¢(u), say (w>0). 


2 
This gives |l,(u)|< ¥ A[(n+4)min(w,+ 4, 27-w,-u)]<L (9<u<n), 
i=1 


‘ 2 
where 4=74))mino,a—a) l (n+ $) min (w,,7- al 


2 
| kn(t) |< ¥ ln +4) (ow) < Ky w<0,—Am|(n+9), 
where K,=4/(3Am) + 20/(9A27?) 


(assuming that w,—w, > 2Am/n), 


2 
| k,(u) |< ZAl(r+ 2) (w—4))< Ky, w>w.+An/(n+}4) 


and | k,(u)—m | < P[(m+ ) (w—o,)] + P[(n + 4) (@.—u)] 
<Ky, w,+An/(n+4)<u<w.—An/(n+}), 
where K,=2/An + 4/A2n°. 


Hence we have 


| R® | < (1-5/n)? F, (77) + (3/n) Bx +{" | k,,(w) | aF,(u)| ‘ 


| R® | < L9F,(m) +2L | 9 | ky(u) | AF, (1), 


with i) " | ky(w) | UF, (u) < L-2a[F, (wv, +Am|(n + })) — Fw, — Aa|(n+4))] + KF (7) 


(| &,(u) | < 1-27 for all w), 


(59) 











45) 


16) 


17) 


18) 


19) 


by 


7) 


8) 








A. M. WALKER 271 


and | RS)| < K3[F, (7) — F.(w,+Am/(n + 4)) + F,(o, —Am/(n + 4))] 
+ K,(K, + 27) [F,(@,—Am/(n + $)) — F.(o, + Am/(n + $))] 
< KYP (7) + K,(K2 + 27) [F,(@2) — F,(o,)). (60) 


For the fourth remainder term (47), we note that to a good approximation W, is in- 
dependent of w,—,, increasing from 0 to }7? as wu increases from —Am/(n+4) to 0, and 
from — }7* to 0 as wu increases from 0 to Am/(n+4). Hence R® is at most of the order of 


1S [Fy (o,-+Am/n)—P,(w,—Am|n)], 
i=1 


In fact we have 0 ps W,,(W2— 01, U) < {47+ P(Am)}?_ (u< 0), 
0> W,(w.—,, u) > {40 —d(Am)}—7? (w>0) 

(provided that 47 > ¢(Am), which certainly holds for A > 1), so that 
| RY? | < max [{$7 + $(Am)}? {F,.(0,) — F,(o —Am|(n + 3) + F (0, + Aa|(n + 4) — F,(o,)}, 

{n° — (30 — $(Am))*} {Fo + Aa (n+ 4)) — F(0,) + Fy (2) — F.(,— Am]|(n+ }))}], (61) 
or, when (Az) can be neglected in comparison with 47, 

| RP | <7 max [}{F,(@,) — F,(o, —Am|n) + F,(o, + Am|n) — F,.(4)}, 
3{F,.(o, + Amn) — F,(w,) + F,(,)—F,(@,—An|n)}]. (62) 


From (57)-(62) we obtain an upper bound, U say, for (49). For U to be small, we certainly 
must take A to be appreciably greater than unity, since the term K,(K,+ 27)/m? > 2/(5A) 
occurs; the value of this for A = 4, for example, is 0-12. However, A must also not be too large 
because of the contribution to U from (61) or (62), which in general will be small only for 
W@,—,>An/n. Thus the condition that U is to be small may impose a fairly severe re- 
striction on the minimum admissible length of interval w,—,. The contribution from the 
remaining terms in U, which is equal to 


{C,P,.(7) + CLF, (Wg + Am/(n + $)) — Fo — Am/(n + 3) H/F, (@2) — F.(oy}, (63) 
Ww ere C,=[(L+1-5/n)?+ K,(K,+2L+3/n)]/7?, C,=1-2(2L+3/n)/m, 


will usually be negligible unless F.(w,)—#',(w,) is a very small fraction of F,(7); e.g. if 
A> 4, w, and 7 —w,> 167/n, n= 200, we have C, < 0-003 and C, < 0-04. 

From the behaviour of the functions W,,(w), l,,(w) and k,,(u) it is easy to see that U will 
often be a very conservative upper bound, so that the approximation may be quite accurate 
even when U is not small (this is the case, for instance, in the example of §5). However, 
to obtain a better estimate of the relative error it seems to be necessary to make some de- 
tailed assumptions about the. behaviour of F,(w). For example, if F,(w) has a derivative 
f,() which is continuous over an interval covering (w;, 2), we can show that the approxi- 
mation may be quite good even when , —, is only of order 7/n. In fact if we have 


f.(w)-f,("5"9) 


<e for <6, 





@ 























272 Goodness of fit test for spectral distribution functions 


where 6 > (A+ 4) 7/(n + 4), with w = (n + 4) (w.—,)/7, and assume that ¢ may be neglected, 
we find that the contribution to the relative error of R® and the last integral in R®, from 
which the dominant terms in U usually arise, is 


A+tp ‘ 
(2/un8y |" iu + Ju) — Siu — um} du, (64) 


which is, for example, <0-10 when ~>4, A>2. [This follows quite simply by using the 
results 
u y 
| S(u) du=1+ 20,/y, } S?(u)du=4n—1/2y+,/y? (| 4,|, | A.| <1), 
0 Jo 
where S(u) = 47 — Si(u).] 

All the above applies equally to the approximation to 47?H{B,,,(w,) — B,,,(@,)}*, since we 
only have to change the sign of 1,,(w) in (41). The expressions for the covariances, given by 
(30) and (31), may be dealt with in much the same way; we may note that for w,=,, it 
can be shown, by considering an integral similar to (64), that taking these to be zero may 


be a good approximation for w,.—,, @,—, of order 7/n when f,(w) can be treated as 
constant over an interval containing Wy. 


APPENDIX 


Asymptotic power functions for the tests of §3 
(1) The U test 


We assume that to a sufficiently good approximation, 7; = 20?2;, where the x, are dis- 
tributed independently with probability density functions e~*i (0 < x; < 00). The y; will then 
be distributed independently with probability density functions 

(1+6,) (yi+(L-y)*) (0<y<3; 1=0,1,...,4-1), (1) 
where ¢€; = (1/0?) —1. 
Let v; = 4u,;= —log 2y,;. Then from (1) we have 


E (vj) =(1+6,) lay | “oer do, + 5I. ve" (1 — Jet av, 
0 “Jo 
=(1+6,)-"(4) P(r + 1) 
+4(14+ ee e(1 + S (—)®e,(e;—1)...(e, -—8+ 1) (Jes!) dv; 
0 s=1 


=H +e) Ter+ 1) eye 4 te & (— pM ETD (2) 
(the interchange of the order of integration and summation being easily justified). Let the 
a? have a positive lower bound and a finite upper bound. Then from (2) we easily see that 
the third absolute moments of the v; about their means have a finite upper bound and that 
the variances of the v; have a positive lower bound. Hence by Lyapunov’s form of the 
central limit theorem (Cramér, 1946, p. 215), the distribution of {V — Z(V)}/,/(var V), where 


k-1 
V= > v,=4U, tends to the Normal form with zero mean and variance unity when k->oo. 
i=0 








ap 


N 


ar 





he 








A. M. WALKER 273 


Let V, be the critical value of V when the significance level of the test is «. Then for large 
k, V,=k+£,,/k, since E(V)=k=var V when €;=0, and the power function Pr(V>V,) is 


approximately equal to 1— O[(k+£,/k—E(V))/,/(var V)). (3) 


Now when the ¢; are of order k-, (2) gives 


E(V)=k+= 5 clogs > +0(1) 


1 
(8 + 1)? aie 


=k—0-42" = €,+O(1), 
i=0 
and var V=k+O(./k). Hence (3) then becomes approximately equal to 


k-1 | 
- o(é, +0-42k+ > «) : (4) 
i=0 
(2) The modified U test 
The probability density function of y; conditional on p,; < } is 
(3)-+? (1+e;) yf (0<y;<}). 
Hence E(v;, | p< 4)=(1+e,)7 T(r +1). (5) 


Similarly we have 





B(vj| o> N= MI +e) T+ 1)(14 ¥ (— pH) aay.) 


gut ee 


It then follows, by the central limit theorem, that when the o? have a positive lower bound 
and a finite upper bound, the distributions of {U;—(U;)}/,/(var U;) (j=1, 2), conditional 
on k, specified values of i with p,; > 4, tend to the Norm i form with zero mean and variance 
unity when k, and k,=k—k,- 0. 

Let k be large and the ¢; be of order k-4. Since 


Pr (p; < 3) = (3) i= 3(1 +e, log 3) + O(k), 
we have E(k,)=4k+O(/k), vark,=}k+O0(1), 
so that k,, k, are O(k). The conditional probabilities 7; = Pr {U; > 2(k; + A./k;)} are then given 
to a good approximation by 
7,=1-—O(A— oe (7) 
,=1—@(A+ kz? ,;) 

(2,, =, having the same meaning as in §3), since from (5) and (6), 

E(Z,v,;) =k, +0-16Z,¢,+O(1), var Z,v,=k, + O(/k), 
and E(So0,)=ky—S6;+O(1), var Bqu;= hy + O(y/h). 


Now let 2(k;+A,J/k;) be the critical value oi U; for the modified U test. From (7) with 
é,=0, A= O1{(1 —a)-4}, so that when the significance level «<1, A+},- The power 


function of the test is 
E(m,+7,—747), (8) 


18 Biom. 43 








274 Goodness of fit test for spectral distribution functions 


where the expectation is taken over all possible sets of values of ¢. Although (8) is a very 
complicated expression, its asymptotic form is easily found. For 


k-1 1 *-1 
E(2,€;)= > ($)'*%e,= 3 y ¢;,+ O(1), 
i=0 i=0 


k-1 k-1 

_ var Zye= 5 (A) (1—(gyjet= 7 E + O(k-), 
i=0 4 i=0 

so that 1, =1—@O(A—0-160/,/2), 7,=1—®(A+6/,/2), 


k-1 
where 0=k-* 3 ¢,;. Hence (8) is approximately equal to 
i=0 
1— O(A — 0-166/,/2) ®(A + 6/,/2). (9) 
Clearly (9) approaches unity when @ takes sufficiently large positive or negative values. 


(3) The M test 


With s?=o?x,, the cumulant generating function of M becomes 
k-1 i) 
log E(exp Md) = > log eal e-ail—-2p01) (G2 x) 28 aa] 
i=0 0 


k-1 
=k log T (1—24)—24)— ¥ [2p log of + (1 24)log (1-240) 





=b| x Aer 2g] +5 |a—26) 5 COM ap iog 3, 
r=1 . i=0 r=1 r 


d \r-1 
where y (x) = (<.) log (x). Hence the cumulants of M are given by 


x,(M) = — 2ky(1) +25 (0? -1—logo?), (10) 
i=0 

kK,(M) =(—2)" ky?) (1) + 2r(r—2)! 5 (o3)""2 {(r—l)o?—r} (r>1). (11) 
i=0 


It is easily seen that when the o? and their reciprocals have a finite upper bound, the 
distribution of M tends to the Normal form as k->oo. Also if the ¢; are of order k-t, we 
have from (10) and (11), 


k-1 
E(M)=1-15k+ ¥ e+ O(kt), 
i=0 
and var M = 2-58k + O0(1). 
Thus for large k, M, + 1-15k + &, ,/(2-58k), M, being the critical value of M for significance 


level «, and the power function Pr (M >M,) is approximately equal to 


1-® E,— E ef/yl(2-58k) (12) 
=0 


i 








ee ee ee ee ee ee | Beoll _. ia ciecil ccllecil ce! 





A. M. WALKER 275 


ary REFERENCES 


Barttett, M. S. (1937). Proc. Roy. Soc. A, 160, 268. 

BartTLetTt, M.S. (1950). Biometrika, 37, 1. 

Bartuett, M. S. (1954). Publ. Inst. Statist. Univ. Paris, 3, Fasc. 3, p. 119. 

Barttett, M. S. (1955). Stochastie Processes. Cambridge University Press. 

Bartuett, M.S. & RAJALAKSHMAN, D. V. (1953). J. R. Statist. Soc. B, 15, 107. 

Box, G. E. P. (1949}. Biometrika, 36, 317. 

Crammr, H. (1946). Mathematical Methods of Statistics. Princeton. 

Doos, J. L. (1952). Stochastic Processes. New York: Wiley. 

FisHER, R. A. (1929). Proc. Roy. Soc. A, 125, 54. 

FisHer, R. A. (1941). Statistical Methods for Research Workers, 8th ed. Edinburgh: Oliver and Boyd. 

GRENANDER, U. (1951). Ark. Mat. 1, 503. 

GRENANDER, U. & ROSENBLATT, M. (1953). Ann. Math. Statist. 24, 537. 

Kenpat, M. G. (1949). Biometrika, 36, 267. 

(9) Saraan, J. D. (1953). J. R. Statist. Soc. B, 15, 140. 

TrrcHMaRSH, E. C. (1944). Theory of Functions, 2nd ed. Oxford University Press. 

Wuitt.k, P. (1951). Hypothesis Testing in Time Series Analysis. Uppsala. 

Wurtz, P. (1952a). Trab. Estadistica, 3, 43. 

WuirttteE, P. (19526). Biometrika, 39, 309. 

Wuirtte P. (1954). Appendix to A Study in the Analysis of Stationary Time Series, by H. Wold, 
2nd ed. Uppsala. 


nce 

















[ 276 } 


SUFFICIENCY CONDITIONS IN REGULAR MARKOV CHAINS 
AND CERTAIN RANDOM WALKS 


By J. GANI 
Nuffield Fellow, Statistical Laboratory, The University of Manchester* 


For positively regular Markov chains with a finite number of states, transition probabilities of the form 
Pij(9) = a; exp {Kj A,(0) +A,(9)}, 

are known to admit a sufficient estimator of @ in realizations of the chain starting with a fixed state 

and consisting of a fixed number of transitions. 

This paper considers whether transition probabilities of the same form will admit a sufficient 
estimator of @ in other finite regular, but not positively regular, Markov chains. For chains with an 
irreducible subset of two or more states, in which a realization starts from a fixed state and consists of 
a fixed number of transitions, these probabilities are found to admit a maximum-likelihood estimator 
of the function g(@) = —Aj(@)/Aj(9), which is sufficient and unbiased. 

There is some difference in chains with an absorbing state, in which realizations start from a fixed 
state but continue until the absorbing state is reached; in this sequential case, the maximum-likelihood 
estimator, with the number of transitions in the realization, together provide a sufficient estimator of 
the function g(@), which in general is no longer unbiased. 

We restrict ourselves to the particular case where a certain linear relation is satisfied. This gives rise 
to some simple stochastic matrices admitting a sufficient estimator of 0, which consist of probabilities 
of the forms @ and 1—4@; in some of these cases, unbiased sufficient estimators of 6 reduce to known 
results of Girshick, Mosteller & Savage (1946). 

Some non-regular finite and infinite chains with absorbing states, associated with random walks, 
whose matrices consist of various patterns of probabilities 0 and 1—6, are also found to admit a 


sufficient estimator of 0. The paper ends with the examination of such an example, arising in sequential 
estimation. 


1. INTRODUCTION 


In a recent paper (Gani, 1955), the author considered some sufficiency conditions holding 
for positively regular Markov chains with a finite number s of states H,,...,#,; for these 
chains, stationary probabilities exist and are all positive. 

It was shown that, for a realization of the chain starting with a fixed initial state and 
consisting of a fixed number n of transitions, a sufficient estimator of 0, the parameter 
defining the transition probabilities p,;,(9) = Pr {E; | Z,} (i,j =1, ...,8), existed if in any row 
i of the stochastic matrix p = {p;;}, these probabilities were of the form 


Pis(9) = a,,exp {Kj,A,(9)+A(A)}  (J=1,...,8), (1) 
where the constants «,;>0, and the number r of distinct exponents K;,; in the row could be 


8 
less than or equal to the number s of states. Since } p;; = 1, the functions A,(0) and A,(9) 
j=1 


were related by the equation 


exp {—A,(4)} = >» a,,exp {K,;A,(9)}; (2) 


from this it followed that the transition probabilities in the remaining rows of the matrix, 
except possibly for coefficients, were also given by the r distinct forms of the p,,(9) corre- 
sponding to the r distinct values of the K;, in row 7. 


* The greater part of this work was completed while the author was at the Australian National 
University, Canberra A.C.T., Australia. 











th 
tk 





US 











J. GANI 277 


The question arises whether a form similar to (1), for the transition probabilities admitting 
a sufficient estimator of 0, exists when a chain is regular, that is, with stationary pro- 
babilities some of which may be zero. It is known (Bartlett, 1955, § 2-21; Feller, 1950, § 15-5) 
that regular chains, other than those which are positively regular, have stochastic matrices 
the simplest of which may be written as 


(5) ° 


where S, a square submatrix of transition probabilities depending on 0, constitutes a single 
irreducible closed subset, such that the states in it form a positively regular chain. The 
submatrices Q and R also consist of transition probabilities depending on 0, and 0 is a 
submatrix of zero elements. The case of particular interest, which arises in sequential 
problems, is that in which the irreducible closed set consists of a single ‘absorbing’ state 
permitting entry but no exit; S is then the single element 1. 

In the two cases where the closed set consists (i) of two or more states none of which is 
closed, or (ii) of a single absorbing state, we shall consider two distinct types of realizations 
of the chains. For the first, exactly as with positively regular chains, we shall take realiza- 
tions consisting of a fixed number n of transitions and starting from a fixed initial state; 
these may end with any one of the accessible states in the chain, nor does the possible entry 
of the system into the closed set terminate the process of transition from one state to 
another of it. For the second case, however, a realization starting from a fixed initial state 
and consisting of a similar fixed number n of transitions could result in the absorbing state 
being reached in x <n transitions. In such cases, as is usual in sequential problems, we 
consider realizations in which the process is allowed to run until the absorbing state is 
reached; the number of transitions x required for this to happen is then itself a random 
variable. We proceed to examine these cases in greater detail. 


2. CHAINS WITH A SINGLE IRREDUCIBLE CLOSED SUBSET OF TWO OR MORE STATES 


Let us assume that the stochastic matrix (3) with a single irreducible closed subset of t > 2 
states can be written 


Pu Pu | 0 0 
S 0 Pa eee Pu | 0 eee 0 
p= oes =) = |——_— poe ee (4) 
Q Ptrszr ++ Pure) Pures --- Prrs 
Paes Dat | Pstt+t oss Das 


where the transition probabilities are p,; = p;;(@), some of which may be zero, and where 
the states H,,...,#, form a positively regular chain. It is always possible in a sufficiently 
large number of transitions to pass from any state H,,,,...,H#, into the subset of states 
E,, ..., H,; once the system has entered this closed subset no exit from it is possible. 
Consider now a realization of the same kind as that for positively regular chains, that is, 
one starting with a fixed initial state and consisting of a fixed number n of transitions. If 
we started with one of Z,, ..., Z,, we should be dealing with the case of the positively regular 











278 Sufficiency conditions in regular Markov chains 


chain already investigated; we therefore begin with one of H,,,, ..., #,, and obtain a realiza- 
tion S of the n + 1 states 2g eee 


for which the likelihood function is 
8 
L(8)= > Nis In p;;, 
i,j=1 


8 
where the n,; are frequencies of transition from H; to H;, such that Y n,; =n. 
i,j=1 
It can be verified directly that the form (1) for the p,,(@) will satisfy, exactly as in the 
positively regular case, the sufficiency conditions for the maximum-likelihood estimator 
T of 0. For the likelihood function is then 


8 8 
LS) = Y mj KyA(A)+na(A)+ XD n,;lna,;, 
i,j=1 i,j=1 
and gives the estimator T' of 6 as 


X ny Kyln = - ATA) = (7), 


ane 


so that L(S) is clearly factorizable as 
8 
L(S) = n{9(T) AA) + AGO} + LY myn ay,. (5) 


i,j=1 
It follows that 7 is sufficient; moreover, as in Rao (1952, §4a.3) it can be shown that 
= nj Kjj/n = —Ag(T)/Ay(T) 
is an unbiased estimator of g(#) = — A3(9)/Aj(@), so that 
(Dmg Keylm) = 9(A). 
A point of minor interest which follows from the structure of the matrix (4) is that the 
number of distinct forms for the p;,(0) in any one of the rows 1, ...,¢ is r<¢; it follows that 


there can only be r<t¢ distinct forms for the p,; in the remaining rows ¢+1,...,8 of the 
matrix. 


3. CHAINS WITH A SINGLE ABSORBING STATE 
A typical stochastic matrix with an absorbing state H, can be written as 


1 a 
Toe (6 
Ps Peo ++ 2P 


where the transition probabilities p;; = ,;(@), some of which may be zero, are such that if 
a realization starting with one of H,,..., #, is allowed to continue, #, must eventually be 
reached, the total number of transit’ons in the process being x, a random variable whose 
distribution depends on @. We refer to this type of realization as sequential. 

Let such a realization result in the sequence S’ of states 


By Bp 5 By Be 





th 





1€ 
or 


5) 


1 
at 
1e 





J. GANI 279 
the likelihood function is 
L(S‘) = Eris npg (¢=2, ...,8;j=1, ...,4), 
where the n,; are transition Seaeebebie from E; to H;, such that },; = x. If the transition 
Pe 


oJ 
probabilities p,,(0) except for ,, = 1 are of the form (1), the likelihood function 
L(S’) = ¥n;; Kj; A,(9) + 2A,(0) + Dn, na; (7) 
i,j iJ 
will give the maximum-likelihood estimator T,, of # as 


Since ~z is itself a random variable, the likelihood function L(S’) of (7) is not factorizable, 
as was L(S) of equation (5), nor is T,, a sufficient estimator of 0. 

A lemma due to E. Fay, quoted by Lehmann (1950), and proved under somewhat 
different conditions by Blackwell (1947), enables us to deduce that (7,2) is a sufficient 
statistic* for 0. The lemma states that: 

If for each value m of x, T,, is a sufficient statistic for 0 in the sample of fixed sizem = ¥) ,;, 
then (T7;,, x) will be a sufficient statistic of 0 in the sequential case. ™ 

For in the sequence S’, starting with H;, the conditional probability of the sequence, 
given that 7, = t and x = m, is 

Pr{S"| 7, = t,2 =m} = oP ee = 
e 





exp {mg(t) A,(9) + mA,(A) + nj In «;;} 
= 1,9 
~ exp {mg(t) A,(A) + mA,(4) + » 1; In a;5} 





exp x n,; In o;;} 
uy, 





rs 9 
Yexp{d ,;In@;;}’ (9) 
s’ a9 


where > indicates summation over all possible realizations S’ beginning with E;, consisting 
e 


of m transitions, and for which 7), = t. Since the probability (9) is independent of 0, (7;,, x) 
is a sufficient statistic for 0 in the sequential case. 

In general, however, the maximum-likelihood estimator (8) of g(@) will no longer be 
unbiased. We shall not consider general methods of finding unbiased sufficient estimators 
of g(9), but restrict our discussion to the simple case where }\n,;K;; and x are linearly 

%9 


related. We show that in some of these cases, unbiased sufficient estimators reduce to those 
found by Girshick e¢ al. (1946) for certain sequential binomial problems. 


4. REGULAR CHAINS FOR WHICH >)7,;K;; AND & ARE LINEARLY RELATED 
i,j 


Particularly simple results follow when there is a linear relation between the statistic 
> n,;K,; and the number of transitions x in a sequential realization. Let the relation be 
i,j 

a3 


* Although apparently not in universal use, the term is self-explanatory; we follow Lehmann and 
others in employing it. 











280 Sufficiency conditions in regular Markov chains 
where A and B are constants independent of x; we see that the factorizability condition for 
the likelihood function (7) will now hold, since 
L(S’) = x{AA, (4) +A,(9)} + BA,(9) + ¥ n,;Ina,;;, (11) 
i,9 


and the maximum-likelihood estimator T,, of @ is given by 


—{BA,(T,)}/{AAa(T) + Aa(Z)} = &- 
It is easily shown that 
E(x) = —{BA;(0)}|{A.A4(8) +A4(0)} = B(8) 


so that x is an unbiased estimate of h(@). We consider some simple examples as illustrations. 
The simplest stochastic matrix with an absorbing state is that for the two-state case, 


- 1 0 - 1 0 
s Lote: sais. o-<taonant penal ee eacad pe on a) 
(12) 


where K,, = 0, Ky. = 1. For a realization in 2 transitions, the system starting from £, 
must move from £, to £, only in the last transition. The relation (10), which is 


Xj Ki; = (e—1) Ky+ Ky, = x-1, 
ij 
is clearly satisfied, and the maximum-likelihood estimator T,, = (~—1)/zx of @ sufficient. 
The three-state case, with the stochastic matrix of the form 
1 0 0 
P =[a,expKy,A,+A, a.exp Ko.Ay + Ag Mg eXp KogA,+Ag}, 
Os, exp Kg, A, +Aqg AgpexpKg.A,+A_ gexpKy,A,+A 


is known to have the same two or three distinct values of the exponents K,,; in each row, 
with some of the a;; possibly zero. It is found, on considering realizations of the chain 
starting from a fixed state ZH; (t=2,3), that the linear relation (10) will hold only for 
stochastic matrices 


1 0 0 
p=|1-0 a (1-a)0] (0<a,#<]1), (13) 
1-@ £0 (1-£)0 
when it takes the form ¥ n,,K,;,; = «—1, or 
1 0 0 
p= 1-0 0 0 ’ (14) 
0 1-0 @ 


when it is = n,;K,; = x—r+1, depending on the initial state Z, (r = 2, 3). 


The sia (13) and (14) are exhaustive for the three-state case, but it is difficult to 
treat the general case of the chain with s states in a systematic manner. Nevertheless, it 
can easily be verified that linear relations (10) hold for stochastic matrices 


1 0 ee 0 
_ 8 
p= 1-0 G99 ... o.0 (0<ay<1; 3 ay=1,i=2,..48), (15) 
eee eee eee eee j=2 
1-0 hgg9 01. Ogg 








J. GANI 281 














a for which are a generalization of (13), or 
1 0 0 0 
ei —- FS 0 
=| 0 1-06 6 0 }> (16) 
0 ada!” a ee 
a generalization of (14), or also stochastic matrices of the mixed form 
ons. ' " ‘ ‘ 
e, 
} 
(12) 1-0 Op eee 7.0 0 
p = 
1 1-0 #6 0 
0 1-0 @ 0 
7 0 dees Rise, aie ‘ee, see, ae 7 
k 
j=2 
If we assume that the initial state in a realization is the last state H,, the stochastic 
matrices (16) of which (12) and (14) are particular cases when s is 2 and 3 respectively, 
correspond to part of a curtailed single sampling scheme of Girshick ef al. (1946, §3B). In 
ow, the matrix (16), 1—6 is the probability of finding a defective in the sample, and sampling 
ain | continues until s—1 defectives are observed. The likelihood function of a realization S’ 
= | Sey L(S’) = (s—1) n(1—6) + (e—8+1)InO, 
and the maximum-likelihood estimator of @ 
‘13 
ai T, = 1—(s—1)/z. 
This, however, is biased; for, from the probability of a realization 
5 x—1 
Pr (S’) = ee (1—6)s-1 6-st+1  (x=s-1,...), 
14) we can obtain the moment-generating function of x 
M(t) = E&(e) = (1— 0)" (e4# = 8), 
and from it, by first integrating and then substituting ¢ = 0, we find 
to e-1( 17 
it wat) = &(a-) = —(1—6)*-1/9-*+1 In (1-0) + & 0-4" (1 -0)-r, 
a t-0 raa(r—1) 
so that the expectation of 7, is not 0. 
Girshick et al. (1946) give the unbiased sufficient estimator of @ for values of s > 2 as 
15) 6 = 1—(s—2)/(x-1). 
If we consider {fer M(hat} = &((¥—1)-1) = (1—6)/(s—2), 
t=0 











282 Sufficiency conditions in regular Markov chains 


we can verify directly that the expectation of the estimator 0 is 0. The case of s = 2 is some- 
what special; Girshick et al. have shown that here, 4 can only be 0 or 1. 

The previous method for finding unbiased sufficient estimators of 0 does not apply to 
matrices of the forms (15) or (17); for these, the necessary condition that the probability 
of different paths in a realization of x transitions be the same does not hold. 


5. NON-REGULAR FINITE, AND INFINITE CHAINS SATISFYING THE LINEAR RELATION 


The condition (10) that the estimator of @ in regular chains with an absorbing state be 
sufficient, may apply equally to some non-regular or infinite chains with absorbing states, 
for which realizations start from some fixed state ZH; and continue until an absorbing state 
is reached in x transitions. Suppose that, associated with all non-absorbing states £Z,, 
there are only two distinct values of the transition probabilities p,;: 0 and 1 — 0, the remaining 
values being zero. Then, grouping together the n,,; for which the probabilities are 0, and also 
those for which they are 1 —@, so that 


TMyz=%y, YNyzy=N, M+R, = 2, 
0 1-0 
the likelihood function of the realization is simply 
L =n,ln@+n,In (1-84), (18) 
and the maximum-likelihood estimator of 0 
T,, = 4/(Ny + Ng) = 14/2. 
Providing the linear relation (10), which reduces to 
n, = Ax+B, (19) 


holds. the likelihood function (18) is factorizable as in (11), and the estimator 7, is sufficient. 
A non-regular finite chain with s states which satisfies these conditions is the random 
walk with two absorbing end states, with the stochastic matrix 


1 0 
1-6 0 7] 
0 1-0 0 @ 
P ~ eee eee eee 
1-0 0 @6 
0 1 
For a realization starting with HZ; (i=2,...,s—1), and ending with HZ, after 2 transitions, 
the linear relation (19) is n, = }(a—-i+1), (20) 
and the estimator of 0, 
T, = $+3(1—1)/2, (21) 


will be sufficient. A similar result will hold if the process ends with Z,. 
The sufficiency conditions also hold for certain infinite random walks with one absorbing 
state; for the chain with the infinite stochastic matrix 


1 0 


P= OL CA<4) (22) 





~s— 


-_»> ££ tee = 





ne- 


to 


18) 


19) 
nt. 
om 


ons, 
(20) 


(21) 


ping 


(22) 





J. GANI 283 


in which a realization starts with H; (i=2,...) and ends with H, after x steps, the relation 
(20) between n, and x holds again, and ensures the sufficiency of the estimator (21) of 0. 
Another infinite random walk is represented by 


1 0 0 


me oo het, 


which for a similar realization will lead to the linear relation 
n, = ¢(x—-i+1), 
which again ensures the sufficiency of the estimator 
T,, =44+(1—1)/(382 
of 0. 2 = gt (1—%)/(32) 

Several illustrative examples of the same kind can be contructed; for each, the method of 
Girshick et al. would lead to an unbiased estimator of @ which would be a function of the 
sufficient estimator 7. We restrict ourselves to a single example mentioned by Moran, 
which reduces to a case already described. 

Moran (1953) considers the infinite random walk with an absorbing barrier for which 
the states are H_,,...,H,...,H,,..., where H_, is the absorbing state. The stochastic matrix 
for this chain is (22); the process starts from HZ,, and proceeds until either H_, is reached in 
x<n steps, or any of the other available states HL; (i= —(s—1),...,0,...,”) is reached in 
n steps. The likelihood functions in these two cases are respectively given as 


L, = 4(ax—s)m6+4(~+s8)In(1-9), LD, = 4(n—s+i)Ind+3(n+s—7)m(1—86). 
The maximum-likelihood estimator T,, of # is found to be 
7.,= {TP = (x—s)|(2x) or T® = (n—s+i)/(2n)}, 
where, since T2>4—<s8/(2n), T2 = 4-—8/(2x) <4—8/(2n) <T?, 
the estimator 7, = {7 or T} will itself specify the appropriate likelihood function, without 


the need to state whether absorption has occurred or not. We verify that the likelihood 
functions in the two cases can be written as 


L, = $8(1—27%)- na(1—0)—4s8na1—-8)4, L, = nT? in A(1—0)1+nIn(1-8), 
where the first is effectively the form (18), and the second the ordinary binomial case. The 
estimator 7', of 0 is sufficient, but 7? is not unbiased. 


The method of Girshick et al. (1946) leads to the sufficient unbiased estimator 
T' ={T® or T} of 0, where these are 


po - 1 -@- +9) nw law o a) a (yn -2-a) 














2s(a—1) ” n )-( = ) 
i(n—s+i)) \4(n-s-i) 
From the expressions (n- ) ; 7 ) 
po = 1 S-D{- 2724+} na _ \eTP) \n(l-T?)-s ; 
ae —279)1= 1) (a te) 
nT? (na —T?)—8 


it is clear that the unbiased estimator is a function of the maximum-likelihood estimator T,,. 











284 Sufficiency conditions in regular Markov chains 


I wish to thank the unknown referee of an earlier paper (Gani, 1955) for suggestions which 
have been incorporated in § 5, and also Profs. P. A. Moran and M. 8S. Bartlett, and Dr H. 
Ruben for criticisms of an earlier draft of the paper. 


REFERENCES 


Barttett, M. 8. (1955). An Introduction to Stochastic Processes. Cambridge University Press. 

BLACKWELL, D. (1947). Conditional expectation and unbiased sequential estimation. Ann. Math. 
Statist. 18, 105-10. 

FELLER, W. (1950). An Introduction to Probability Theory and its Applications. New York: John Wiley. 

GanI, J. (1955). Some theorems and sufficiency conditions for the maximum likelihood estimator of 
an unknown parameter in a simple Markov chain. Biometrika, 42, 342-59. 

Grrsuick, M. A., Mostetter, F. & Savaacs, L. J. (1946). Unbiased estimates for certain binomial 
sampling problems with applications. Ann. Math. Statist. 17, 13-23. 

LrepMann, E. L. (1950). Notes on the Theory of Estimation. Lectures delivered at the University of 
California. 

Moray, P. A. P. (1953). The estimation of the parameters of a birth and death process. J. R. Statist. 
Soc. B, 15, 241-5. 

Rao, C. R. (1952). Advanced Statistical Methods in Biometric Research. New York: John Wiley. 








Thich 
’r H. 


Math. 


Viley. 
tor of 


ymial 
ity of 


tatist, 





[ 285 ] 


SOME ASYMPTOTIC DISTRIBUTION THEORY FOR MARKOV 
CHAINS WITH A DENUMERABLE NUMBER OF STATESt 


By CYRUS DERMAN 
Columbia University, New York 


The joint asymptotic distribution is derived for certain functions of the sample realizations of a 
Markov chain with denumerably many states, from which the joint asymptotic distribution theory of 
estimates of the transition probabilities is obtained. Application is made to a goodness of fit test. 


1. INTRODUCTION 


Let Xo, X,,X>,... be a sequence of random variables which assume only non-negative 
integer values{ and which have the property that 


Pr{Xnin =j|Xn =i, , on eee X}} 
ca Pr{Xinin =j|Xn = i} mi Pr{X,, =j|Xpo = 


for all integers m, n > 0 and all states 7 and j for which the conditional probability is defined. 
The distribution of X, will be fixed but arbitrary. Sequences of this type are known as 
Markov chains with a denumerable number of states and stationary transition probabilities. 
The fundamentals of the theory were laid down by Kolmogorov (1936). Feller (1950) has 
also given an exposition of the main results of the theory. 

The probability given above is called the nth step transition probability. We shall denote 
it by p¥?. When n = 1 we shall simply write p? = p;,. 

Suppose such a model is assumed and it is of interest to estimate or test hypotheses about 
the probabilities p,; on the basis of a set of observations 2, ..., x,,. We shall develop here some 
asymptotic distribution theory suitable for this purpose. Results of a similar nature were 
given by Bartlett (1951). However, he assumed a Markov chain with only a finite number of 
states. Anderson & Goodman (1956) also considered inference problems for such chains. 
Our main tool will be a central limit theorem of Doeblin§ (1938) for denumerable Markov 
chains. 

In the remainder of the introduction we shall state, without proof, those results from the 
general theory which we shall need. The nth step transition probabilities satisfy the relation 


py = Pe PE Dey (4,9 = 0,1, -..5 2 = 2,3,...). 


More generally 
parm = = py pm (3,7 =0,1,...; 2, m = 1, 2,...). 


The latter equations are known as the Chapman—Kolmogorov equations. 


+ Work supported in part by the United States Air Force through the Office of Scientific Research 
of the Air Research and Development Command. 

{ Each integer denotes a possible state of the Markov chain. 

§ I am indebted to Dr T. E. Harris for calling my attention to the applicability of Doeblin’s theorem 
in connexion with this paper. 











286 Some asymptotic distribution theory for Markov chains 


An important notion in Markov chain theory is that of the number of transitions necessary 
to reach a certain state for the first time given initially a particular state. We let 


= Pr{X,n =I: Xmiy%j for l<ev<n|X, =9} (m,t,j = 0,1,...;2 = 1,2,...); 


« 


oo ao 
fis = x %, Ny = = he: On = 2 (v— mis)? 3 (mj; < 00). 
p= y= = 


We call m,; the mean first passage time from state i to state j. Ift = j, m;; is called the mean 
recurrence time of state i. States i and j are said to belong to the same class if f;5 f% > 0, 
i.e. if there exist integers n(i,j) and n(j,i) such that piv >0 and p%i-)>0. A state é is 
called recurrent if fff = 1. This implies that fj§ = 1 for all j belonging to the same class as 7. 
If ff <1, cis called transient. If 7 is recurrent and j belongs to the same class as i, then j is 
also recurrent. Consequently, all states of a class are either all recurrent or all transient. 
We can then speak of a class as being recurrent or transient. If all states belong to the same 
class we refer to the Markov chain as being irreducible recurrent (transient). Even though 
it might not always be explicitly stated, we shall deal throughout only with irreducible 
chains. If f* = 1 and m,;<0, then 7 is said to be a positive state: if m,, = 00, i is called 
null. If 7 is positive (null) and j belongs to the same class, then j is positive (null). Thus 
the states of a recurrent class will all be positive or all null. 

If ff = 1, my,< 0, ae A y py = — for all states j belonging to the same class asi. 

> v= tt 

If m,;; = 00, then lim p¥ = 0. If all states belong to the same positive class, then 


n—->@o 
oe oe, ij l 
y>—=1 and YS —p,,=— for j=090,1,.... 
i=0 Me Home ~ amy, ’ 


We call a state i periodic with period t if ¢ is the greatest common divisor of all n such that 
pt) > 0. All states in a class have the same period. A chain, if irreducible, is called periodic 
or aperiodic according as t>1 or =1. 

We shall consider the number of times a state is visited in n transitions. Let 


N,,(t) = {the number of v’s such that X, = i for 1<v<n} 
for i = 0,1,...; = 1,.... More generally we define 
N,, (iz, --+,4,) = {the number of v’s such that X, = i,,..., X,_,,, = 4, for l<v<n}, 


where i,, ...,7, is a finite subset of all states. 


2. SoME LIMIT THEOREMS 


We shall assume throughout that the Markov chain under consideration is irreducible, 
positive recurrent and aperiodic. 

We first stat- , without proof, a theorem proved by Doeblin (1938). Consider the sequence 
of random variables {Y,,} attached to the Markov chain {X,,} in the following way: 

Y,, = x, if X,, = 7 (i = 0,1,...), where the z,’s are arbitrary real numbers. Let 

U (i) = Vint F. +¥in 

where m, is the trial in which state i is reached for the Ith time. It follows that {U,(i)} 
(J = 1, 2,...) is a sequence of independent and identically distributed random variables. 








sary 


asi. 


that 
odic 


ble, 


nce 





Cyrus DERMAN 287 
Let 9(j) = x; — EU(t)/m,, (the dependence on i is only apparent; ef. (2-1)). 
THEoREM | (Doeblin). If HU7(i) < 00 and a7, < 0, then the distribution of 
S ¥,—— EYj(i)| n+ 
m=1 Mi 
tends as n +00 toa Normal distribution with mean zero and variance 
= 99) SI) S Mgt Mig — Mix 
= +2 > = i * g(k 
pr My, G=0,54i jj 2 Max - 
provided that the series converge absolutely. 
The expression for a? is due to Chung (1953). Using Doeblin’s thecvem, we now prove 


THEOREM 2. Let i,, ..., 1, be any finite sequence of distinct states belonging to the same positive 
class and suppose there exists a state i in the class such that o?; < 00, then there is a matrix £ 
such that the distribution of 

1 : n ; n 
{Mat - re) N,,(t,) — 


tty 








tends as n> 00 to a multivariate Normal distribution with mean zero and covariance matrix >, 
which can be found from the expression for o* in Doeblin’s theorem. 
Proof. Chung (1953) has shown that 


F = a; 
BU (t) = my; X —*, 
j=0 5; 
2-1) 
’ o ot, 2 my+m.—mM, ( 
EU}(s) = my  —2 + 2my Y —* Y 2 —# x, 
j=0 jj i~0 7 k=0 Mk 
v 





provided that the series converge absolutely. 
Let 2, = A, 111%, = A,, 4; = 0 for all j +1,,...,7,, Where A,,...,A, is any arbitrary set of 
real numbers. It is clear from (2-1) that HU}(i) < oo and 














. . ; 
EU(t) = my 2 a (2-2) 
Then ms Se 
>> Yn —— EU(i) = 2 %,.-* 3% = 
m=1 ti m= a=1 Mi ix 
z n 
” a(n i ; 2-3 
= (,) Mae. (2-3) 
Because of Theorem 1 and the continuity theorem for characteristic functions, 
lim Z exp i > Aa( Na) i soe Jw} = ¢~120, (2-4) 
n> a=1 Mi vig 


where Q = o* is a positive semi-definite quadratic function of A,,...,A,. But since the A’s 
are arbitrary (2-4) indicates the convergence of the distribution of the vector 


{Mali - ke =| (2:5) 


iyty ity 
to a multivariate Normal distribution. The covariance matrix } is determined by Q. 
The fact that only one state i such that o7;<0o is necessary for the truth of Theorem 2 
suggests the possibility that the existence of one such state i might imply 0%; <oo for all 
j in the same class. As pointed out in the introduction, the first recurrence moment is finite 

















288 Some asymptotic distribution theory for Markov chains 


for all or no states of the same lass. The following theorem, proved by Chung (1954), is 
a generalization of this fact and confirms the truth of the above conjecture: 


THEOREM 3 (Chung). T'he p-th moment, p > 0, of the recurrence time of 1 is finite, if and only 
af for every j and k belonging to the same class, the p-th momenis of the first passage times from 
j tok and from k to j are finite. 

Let {X,,(r)} denote a Markov chain derived from {X,,} as follows: X,,(r) = (4, ...,4,) when 
Xy, = t,, «+, Xp_-41 = 4, for every set of states 7,,...,i1,. States having probability zero of 
occurring will be ignored. 


Lemma 1. If the p-th recurrence moments for the states of {X,,} are finite, then the p-th recur- 
rence moments for the states of {X,,(r)} are finite. 

Proof. Consider any state (i,, ...,7,) which has positive probability of occurring. Suppose 
X, = i,. Let be a random variable which denotes the number of steps until the rth recur- 
rence of i,. Since 7 is the sum of r independent and identically distributed random variables 
(the number of steps until one recurrence of i,) each of which has by hypothesis a finite 
pth moment, it follows easily that H7? <0o. We shall say the event ¢ is associated with 7 if 
X,(r) = (i,,...,4,) for some v (r — 1 < <7); i.e. the event ¢ is associated with a random variable 
7 if during the course of the r recurrences of 7 the sequence of states 7,,...,7, is visited in 
succession. Let 7,, ...,7, denote N successive such random variables where N is the smallest 
possible integer such that the event ¢ is associated with two of the random variables. The V 
random variables are independently distributed. 7, and one other random variable have for 
their distributions the conditional distribution of 7 given that € is associated with it, and the 
N —2 others have for their distributions the condition distribution of 7 given that € is not 
associated with it. Now, if it can be shown that H(7,+...+7y)? < ©, it will follow, since at 
least two occurrences of the state (7,, ...,7,) take place during the time 7,+ ...+7y, that the 
pth recurrence moment of the state (7,, ...,7,) is finite. To this end let P denote the prob- 
ability that the event ¢ is associated with 7. Since (7,,...,7,) has positive probability of 
occurring, P>0. If P= 1, then N = 2 with probability 1, and it is easily seen that 
E(1, +72)? <oo. Suppose P< 1. Let E(7?|«¢) and E(r? | é) denote the conditional expecta- 
tions of 7? given that € is associated and not associated with 7° respectively. Since 


Ere = E(r? |e) P+ E(7? | €)(1—P) <a 
it follows that H(r? |e) and E(r? | é) are both finite. It is clear that 
Pr(N =v) = (v—1) P%1—-Py’? (v= 2,...), 


i.e. N has a Pascal distribution. Therefore 


E(r,+...+7y)? = ¥ E(7,+...+7,)? Pr(N =v) 
v=2 


<  E(vmax(r,, ...,7,))? (v—1) P21 — Py 
v=2 

= s ve E(max (7), ...,T,))? (v— 1) P2(1—P)* 
v=2 

< ¥ ve+® max (E(r? |e), E(r° | 2) P21 — Py? 
v=2 

< ©. 


This proves the lemma. 











Cyrus DERMAN 289 


Now let 141, ..-, 445.5 tons -++> borg) «++3 Uppy «++> Up, be & sets of states each having 1, ..., 17, 
members respectively. Let eile 


R= su Pitiary O= 1,---rk). 


We prove now a slight, but useful, generalization of Theorem 2. 
THEOREM 4. If 21), ...,%,,, belong to the same class and if a7; < 00 for i in the same class, then 
as n-> © the distribution of 
1 , ; nP; : ; nP,; 
nt faces ty,)— 9 fees N, (te sees ay, — s (2-6) 


‘tnt Miprinr 








tends to a multivariate Normal distribution with mean zero and covariance matrix t to be 
determined later. 

Proof. Let r = max (r,,...,7;). Let A,,...,A;, be arbitrary real numbers and associate 
A, with the sequence of states t,, tp, ..., ty, (J = 1, ...,k). Let {Y,,(r)} be defined from X,,(r) as 


follows : Y,,(r) = > A, if X,,(7) = (i,, sees a tn» =e try)» 
a 


where > denotes the sum over all such possible / for a given state of X,,(r); e.g. ifj and ¢, j 
a 


are two sets of states and A, and A, are associated with j and i, j respectively, then whenever 
X,,(2) = (¢,9), Y,(2) =A, +A,. If for a given state there is no such |, then Y,(r) = 0. Let 
(i,,...,4,) be any state of {X,,(r)}. Define U(i,,...,7,) as in Theorem 2. By Lemma 1 the 
variance of the recurrence time of (7,, ...,7,) is finite. We have, also, that HU}(i,, ...,1,) <0, 


k 
since U,(i,, ...,4,)<Z ¥ | A,|, where Z is the recurrence time of (7, ...,¢,). Using the fact that 
i=1 


.' Fe 1 . ae , 

lim — > p%) = — for all i, it is easy to see that the mean recurrence time of any state 
n—>o My=0 Mi; 
(i,, ...,4,), Which we denote by m(i,, ...,%,), is —_—_*4-—_ . Thus, since absolute convergence 
: iyig *** Piy_yiy 
is easily proved, we have from (2-1) 


EU(i,,...,,)_ & r 


a 





Mtg, -+25%p) emo MUD, 0 Upp 5 Bay «05 Uy) 





ie 5 3 ePinin +: Pinng-tirraPa 
a=le Mii; 
k ‘vA p 

= —-, (2-7) 
Zim 


Vartan 


where > denotes the sum over all possible sequences #,,...,7,_,.. The argument used in 
a 
ec 


proving Theorem 2 now applies again. This proves the theorem. 

The cases of interest to us below will be where the 7;'s are either 1 or 2. However, we shall 
maintain the general notation as long as possible. 

For statistical applications of Theorem 4, it is necessary to be able to determine f, the 
covariance matrix of (2-6). From Formula 4 below it follows that the vector 


MFLN, (tay, -++5 b4p,) — BN (Gays «+5 F4p,)s +29 Ma (tngs «+s tee) — BN a (ter, «++ ten,)}> — (2°8) 


and (2-6) have the same limiting distribution. Feller (1949) has shown, assuming the 
19 Biom. 43 











290 Some asymptotic distribution theory for Markov chains 


finiteness of a second recurrence moment, that the limit of the variances of (2-8) as 
n->00 is equal to the variances of the limiting distribution of (2-8). If we can show that the 
same holds for the covariances then it will follow that £ can be approximated by com- 
puting the covariance matrix of (2-8). 

The following theorem is for this purpose. 


THEOREM 5. Suppose the sequence of random vectors (X,,,Y,,) converges in distribution to 
a joint distribution function F(x, y) which has finii. second moments. Let (X, Y) be a random 
vector with joint-distribution function F(x,y). If lim EX? = EX? and lim EY?, = EY?, then 
—> co n>o 
lim £X,Y, = BXY. ¥ 


no 


Proof. Let Z,, = X,,+Y,, Z = X+Y. A consequence of the Helly—Bray theorem is that 
EZ? < lim inf ZZ2.. Thisimplies H(X Y) < lim inf H(X,,Y,,). NowletZ), = X,,—Y,,Z' = X— Y. 


n->@o n->o 
In the same way we get that H(X Y)>limsup H(X,,Y,,). Therefore lim E(X,, Y,,) = E(X Y), 
n—>o n—>o 


proving the theorem. 


3. COMPUTATION OF THE COVARIANCE MATRIX 


We now commence with the computation of the covariance matrix of (2-8). It will be 
convenient to prove some asymptotic formulae which will be needed in the computation. 
Tx: all that follows the second recurrence moment will be assumed finite. 

We state without proof 





= 0, then 
n—> n—>o Ya, 
v=0 

r n 

lim 4. n—v b, xa, = b. 

N—>cO pe 

Formula 1. “6 
S (2)-;-) = My Ma) 6,5, =0,1,...), 
v= My) 2m5, My 
where 


Proof. We have directly 
a s (v) 
s (om — 4 oS (n—v) vat 
5 0-2) sata) ‘ 
n=1 M35 n= ‘4 M43 


(3-1) 


Feller (1949) has shown that 





We also have that 


Hence using Lemma 2 we get Formula 1. 








be 


ion. 


3-1) 





Cyrus DERMAN 291 


Lemma 3. 


| 
M 
3 
<= 
ime 
3 
i 
3 
' 


Eo) a 
n p Fe = 
n=1 “my n= ti Miz 
n 
aa (v) 
Sf? S nf per» ( oi ?) 
= APE n(pit-” -—-)— n= 
v=1 ” n=v ‘i Mii n=1 Mii 





N N-v 1 N Rug +4 1 N ~, 
=i? = (o9 -—-)+ Xf? X (ve -—-)- 7 joke 
v=1 t=0 \ My} v=1° t=0 Miz =f 








n Mix 

(3-2) 

By Lemma 2 and Feller’s result the first term on the right of (3-2) is bounded. Now 
+ fe e( . =] < 3 fi y= > ‘|pQ— > (3:3) 

N| far io ma) | Mix 
) . 7 

Feller (1949) has shown that » ps? — —| <oo. Hence lim = yt () _ Pa 0. 

/: Mix n->o Ny Mey 











Using Lemma 2 we then see that the second term on the right of (3-2) is o(N’). 
A similar argument, making use of the fact that x (ie. - z f i?) <0, applies to the third 
term of (3-2). Hence Lemma 3 is proved. 














Formula 2. 
n—r M;;—2(m,—1)my\ . (n—k) (n—k-1) 
See (v) ti li ti cells: 4 
2 (n—v—k) Bri n( 2mi; )+ 2m; — 
for 1,4 = 0,1,... and any integers r,k>0. 
Proof. We have 
n—?r n-r \ n— if in i= k 
E piP(n—v—&) =D (n—v—) (nf — -) +" 
= n'S (of? a} -"S (+h) vm we 4) +—"E (n—v—h) 
yet My) ved “ my) My yA 
M;; — 2(my— 1) my (n— sath 1)(n—b) | 
by virtue of Formula 1 and Lemma 3. 
Formula 3. 
w” k—y D— (n—k)(n—k—1) {M;;— 2(my; — 1) m5; M;;,—2(m,,;—1) my 
= Pmi = PIs 2m M5; a 2m2 mii + 2m*,m;; +0(n) 


for m,i,j = 0,1,... and all integers r,k > 0. 


19-2 











292 Some asymptotic distribution theory for Markov chains 
Proof. We have 


y= 


n—k—v 1 n-r (vy), 
E pe” y= 'E on" s (fp - 5) + "5 (nk) Bt, (3-5) 
t=1 M4; M3; 


Using the fact that = Prt = ae +o(n), Formula 1, and Lemma 2 on the first term on the 
right-hand side of (3- 5) and Formula 2 on the second term we get that (3-5) yields Formula 3. 





We are now in a position to calculate the moments of N,,(i,, ...,7,). 
Formula 4. 
EN, (i,,...,4,) = se 4 Man area +0(1), 
Mi, 2m i, 


r—i 
where P = [] p; ; ; and 1 is the initial state. 
a=1 Mee 


Proof. Let Y, =1 if X, =4,, X,_1 =4,-4,-.-, Xn-41 = 413 Y, =0 otherwise. Since 
X, =! we have 





EN, (11, ---54,) = £ zy, = P : ne (3-6) 
The result follows by using Formula 1. 
Formula 5. 
var N,,(4;, ...,4,) = nP| : +P(- ene + Stats) + o(n) 
ih Mii Minis 
for 4, +4,, ...,4,_3. 


Proof. We have 
n 2 n n—1 n 
ENYiy..i) =H SY) =H SV+2zy, 3 Yl 
v=1 v=r v=r t=v+1 


n-r n—v—r+1 
= EN (iy. i,)+2P? = pip Spe. (3-7) 


Application of Formulae 3 and 4, together with the fact that 
var N,,(t,,...,4,) = HN? (i,,...,4,) — (EN, (t,, ...,4,))? 
proves the result. 
Formula 6. 
cov (N,,(i), N,(j)) = —— _— (eat Md 4 5 (ey Ms; )| +0(n). 


2 2 
Mm,;,™M; Mi M5; 2 MiMi, MF;Mgy 





Proof. The proof is similar to those of Formulae 4 and 5. 
Formula 7. 
var N,(i,7) = "Pit ( +Pu(--+38)) +0(n) 
-_ Mig “\mg Mi, 
Proof. The proof is as in the proof of Formula 5 allowing for the fact that the lower limit 
for ¢ in (3-7) is 0 instead of 1. 
Applying the well-known techniques of asymptotic distribution theory we get easily 








var ,/n ie 7 ~ mis var N, (i, j)+ Py mis “var N,,(i) — 2p;; mie =, cov (Ny, nt), N,(%,9))s 
cov(yn aa n> a i) — {cov (N,,(i,9), N,(k, l)) —py cov (N,(¢,5)» Nn(&)) 


— pi cov (N,,(2), N,,(k, l)) + Py piz COV (N,,(t), N,())}. — (3°8) 











3:5) 


the 
la 3. 


:3°7) 


imit 


(3°8) 








Cyrus DERMAN 293 
Using the above formula and (3-9) and after carrying out the computations necessary to 
aoe cov (N,(i,9), Ny(ks1)) and cov (N,(i), Ny(j,&)) 


for the various special cases we get the following results: 








N,,(i,3) 7 Nitti) 1, Nak, D) — 
var ./n N, (i) ~My Pii(1—Py), COV (vm N, (i) ,n V(b) )~ —MiPijPy if t=k, 
0 if i+k. 
(3-9) 


The analogy of (3-9) with the second moments of a multinomial distribution, with m,; 
playing the role of the number of trials starting from i, is clear. 


4. APPLICATION TO A GOODNESS OF FIT TEST 


Consider the problem of testing the hypothesis that the matrix of transition probabilities 
is a given matrix p;; on the basis of X,,..., X,,, where X, (i = 1, ...,) are observed states of 
the Markov chain in the first n steps. It is clear that a finite number of observations will not 
provide estimates of all the transition probabilities. This suggests choosing a finite subset 
of the transition probabilities independently of the data and testing the hypothesis 
that they are the true transition probabilities, i.e. testing, only partially, the original 
hypothesis. A rejection of the partial hypothesis would, of course, imply a rejection of 
the original hypothesis. The question as to how to select the subset of transition 
probabilities will not be considered. Thus, as far as this paper is concerned, the selection is 
arbitrary. 


Let — 
~ _ N,(+,J) 
a 








Zig = in Pd Po), RAC) = 1- Fry 
JtCe 


v(m :Pis) * 
RC) =1- By and Z(G) = RGN, 
where C; denotes a finite class of states depending on 7. It can be shown using (3-9) that 
var (Z;;)~(l—p,;), cov (Z,;,Zq)~ —V(pyPa) (L+J), 
ip 2Za)~9 (tk), 
cov (Z;;,2,(C;)) ~ — VJ py RG}, | (4-1) 


cov (Z,;; Z,(C,,))~0 (k+4), 





cov (Z 





var (Z,(C;))~1—R(C,), cov(Z,(C,), 2.(C,))~9 (tk). 


It follows from well-known asymptotic distribution theory using Theorem 4, that, if {p,;} 
is the true matrix of transition probabilities, the limiting distribution of any finite set of 
Z,;s and Z,(C;)’s is multivariate Normal with zero means and covariance matrix given by 
(4:1). Let Z;; (g€C;,), Z;(C;) (¢ = 1,...,&) be random variables with a multivariate normal 
distribution with zero means and covariance matrix given by (4:1). Then it follows (see 
Cramér, 1946, p. 419) that k 
xX? =D) UZ5+ZP7(C,) 
i=1\jeCi 
k 


has a y* distribution with > d,; degrees of freedom where d; denotes the number of states 
i=1 











294 Some asymptotic distribution theory for Markov chains 


in C;. Since the corresponding Z;,,’s and Z,(C;)’s have such a limiting distribution, it also 
follows (see Cramér, 1946, p. 314) that the limiting distribution of 


k 
v= z { 
i=1 


i+ 24C9) 
jeCi 

is that of y’*. Thus the y? statistic supplies a x? test for the goodness of fit of a given finite 
set of transition probabilities. The m,,’s in x? can, in principle, be derived from {p,,}. In 
most cases this will not be feasible. The limiting distribution of y? will not be changed if 
the m,,’s are replaced by their estimates obtained from the recurrence times. 


I wish to thank Profs. T. W. Anderson and D. A. Darling for several helpful conversations 
held while this work was in progress. 


REFERENCES 


ANDERSON, T. W. & GoopMAN, L. (1956). Statistical inference in Markov chains. To appear in Ann. 
Math. Statist. 


Bart ett, M. S. (1951). The frequency goodness of fit test for probability chains. Proc. Camb. Phil. 


Soc. 47, 86-95. 

Cuune, K. L. (1953). Contributions to the theory of Markov chains. I. J. Res. Nat. Bur. Stand. 50, 
203-8. 

Cuunae, K. L. (1954). Contributions to the theory of Markov chains. II. Trans. Amer. Math. Soc. 
76, 397-419. 


Cramér, H. (1946). Mathematical Methods of Statistics. Princeton University Press. 

Dorstiin, W. (1938). Sur deux problémes de M. Kolmogoroff concernant les chaines dénombrables. 
Bull. Soc. math. Fr. 66, 210-20. 

FELLER, W. (1949). Fluctuation theory of recurrent events. Trans. Amer. Math. Soc. 67, 98-119. 

Feuer, W. (1950). An Introduction to Probability Theory and its Applications. New York: John 
Wiley and Sons. 

Kotmogorov, A. N. (1936). Anfangsgriinde der Theorie der Markoffschen Ketten mit unendlich 
vielen méglichen Zustinden. Mathemeticeskii Sbornik, N.S., 1, 607-10. 








also 


nite 
In 
d if 


ons 


lich 





[ 295 ] 


A GENERAL METHOD FOR APPROXIMATING TO THE 
DISTRIBUTION OF LIKELIHOOD RATIO CRITERIA 


By D. N. LAWLEY 
University of Edinburgh 


1. INTRODUCTION 


In the theory of testing statistical hypotheses it is well known that —2logA, where A is 
the likelihood criterion of Neyman & Pearson (1928), is distributed for large samples approxi- 
mately as y?. This was proved by Wilks (1938). Box (1949) has shown that for a certain class 
of such criteria it is possible to improve on this approximation in various ways, so that in 
cases where the exact distribution of the criterion is unknown good approximations based 
on a knowledge of the moments of the criterion can be developed even for samples of 
moderate size. The simplest improvement consists in multiplying — 2 log A by a scale factor 

which results in a statistic having the same moments as x? ignoring quantities of order n-?, 
where n is the size of the sample. This scaling device was first used by Bartlett (1937; see 
also his recent note, 1954). The object of this paper is to show that for any likelihood func- 
tion satisfying certain very general conditions an improved 4? test of this typeis, in theory, 
possible. 


2. AN EXPANSION FoR L® 


Let us suppose that the likelihood function, whose logarithm will be denoted by L, depends 
upon p+q population parameters 0,, 45, ...,9,,, assumed functionally independent. We 
shall assume that L and its partial derivatives with respect to the 6’s satisfy some uniform 
continuity condition which allows differentiations with respect to the 6’s to commute with 
integration over the sample space. We shall further assume that the second derivatives 
0?L/002 are of order n, where n is related to the number of observations. These assumptions 
would usually be satisfied in practice. Let 0 denote the true value of 6,. We shall be con- 
cerned with testing the ‘composite’ hypothesis H, that 0,,,,9,,2, ---,Op4q have specified 
values, which if H, is true we can take to be 0, ,,%,s,...,0.,, while 0,,0., ...,0, are 
unspecified and unknown ‘nuisance’ parameters. The criterion obtained by taking minus 
twice the log likelihood ratio will be written as 2(L°+® — L®)), where L denotes the result 
of maximizing L with respect to 6,, 6, ..., 9), and substituting true values for the remaining 
parameters. ’ 

We shall make full use of methods developed by Bartlett (1953 a,b, 1955) in three papers 
on approximate confidence intervals. The expansion for L™ which we now obtain follows 
very closely § 8 of the second of these papers (in future referred to as IT), though our notation 
is different and we have taken in terms of higher order. We shall use the notation 


I, = 0L/20,, Ly = 0°L/20,00,, Ly = 0°L/20,00,00,, ete., 
ey — E(L,.), Avst v E(L,.); etc., 
Ls si Lig Age Lit = Liyot — Arse: etc. 


Then all 2’s are, in general, of order n and the /’s are random variates of order ,/n with zero 
expectations. 











296 Distribution of likelihood ratio criteria 


Let 6,, 6, ping 6, be the estimates inserted for 0,, 9, ..., 9; in L to produce L™, The equa- 
tions for determining the @, may be written 
O = 1+ Lyg%yt FL pg %p%+ FL ystu% eM Cyt. (r=1,2,...,h), (1) 


where, = 6, — @,,and where here and elsewhere, unless otherwise stated, the usual summation 
convention is employed. All suffices run frorh 1 to k. The inverse expansion of x, in terms of 
the 1, is found, with a little algebra, to be 


eS Ll, ie 3A 54 l, l, ai Ors, tu l, l, L, + Crstu l, l, L, + oes, (2) 
where Ap = LLL" Ly, 
Drs, tu = Lo LshL} ‘LY LL Lijws 


Cysty = LOL LF LI Lo yi;, 
and where [L’*] is the inverse matrix of [L,,]. 
We next write 
LT = L41,x, 4+ 41D,,%, 0+ 4D 4%, 5%, + DeD pot Vp Vg Xy tH -++s 
which in view of (1), is equivalent to 
L—L® = 30,22, + 4 Dig: %p%pX,+ ED psi UX Ly + ++ 
Substituting for the x, in this, using (2), we have 
2(L® — L) = — Ll, 1, — fa,gbbeb,— 20 y6 rele le lel, + PeCrstulrlelly + «+++ 
Hence, including terms up to the fourth degree in the l’s, 
2(L® — L) = —A"*l, 1, — fo,41,1,1, + AMA Lely — 4B 10, 1ulpbelyta 
+ eV ratulplelyly +A Cotul bel luy — GAM AT ANT Lgl lwow 
—AVYRON Loli bis (3) 
where Ope = ATIAS*ATD, », 
Bates = NOAMMAINOD sa Asis 
Vrau = NI NMNIX rij, 
and where [A’*] is the inverse matrix of [A,,]. We have here taken the expansion as far as 


terms of order n-!. As an explanation of the way in which (3) is obtained we note that 


Ds = prs — AMASL,,, 4 AHA NEL, baa sing 


and that 
rst L, l, l, _— ADEN (A, vw si Raid L, l, l, Kou BAAMIDAM NA DA Gy; L, l, l, - + eee 
= Ayss L, l, L, +- Are Ar, lL, L, = —_ BA Ooi, L, lL, L, Law + eee 


3. VARIOUS EXPECTATIONS 


We now require the expectations of various products of the l’s. These are readily obtained 


by the method given in § 2 of II. We shall suppose in future that H, is true and that each 0, 
takes its true value 0. Using the notation 


(Ars) _ OA,./00,, (Aystu OApst/OOy, 
(Avsiu = 07A,,/00,00,,, K(rs)\(tu) = EL, shu) 











D. N. LaAwLrey 297 


1a- we find that E(1,1,) = —Ajgs 
E (LL) = — Aye + (Asides 
(1) E(lylyu) = — Arata + Acta) 
on E(l, l s!) = 2A, — yy (Ars) 
(3) 
of 


E(l, lly) = A rstu (Asu)y a (Anus + (Aww )rs — K(rs)(tu)> 








E(L,U,hl,) = 2 ( AsAw) sei 3BAv stu +2> (Ast) ~ a (Ars) +> K(s) (tu) 
(2) (3) (4) (6) (3) 
where, for example, D¥ (Aps)p= (Ans) + (Ards + (Ace) e- 
(3) 
Other expectations, in which we neglect terms of order n, are given by 
| EL lulu) = Aru io (Awe Ay (r,8,t permuted), 
} (3) 
ELL bvew) => a Awww pt Ast (r, 8,t permuted), 
(3) 
EL Ula bow) = x Anu nal (Avu)rt {Asew te (Avw)st + Rin = (Awe {Astu a (Anu) s} i ArsKewlow) 
By making use of these results we find the expectation of 2(L“ — L), as given by (3), to 
be k+e;,+ O(n-*), where e, is of order n— and is given by 
= AAT A petu — (Arse + (Asa va DD AEN Asuw 7 tAnu Aeon 
- Awwe(Asw)u a ArtulAsw)e + (Any (Asw)u “3 (Awu (Agu)o}- (4) 
Here, as before, all suffices are summed over the values 1 to k. The expectation of the criterion 
2(L+® — [)) may be written as 
} 9 + €y4q—p + O(n). 
; In order to simplify the ensuing algebra we shall now assume, which can be done without 
(3) | loss of generality, that the parameters were originally chosen such that, for all unequal 
rands,A,, = 0 = A” (when the 6’s are given their true values). We can then write, discarding 
temporarily the summation convention, 
| k k 
) ex a D> {dress uals 7 (Ays)rst/ (AppAss) at >| {EAP + fA rss Anu 
r,s= rT, 8, =1 
as 
—AjalA, a —AysslAni+ (A rst (Ax1)s + (A, s)s ( Awe! ( a AirAssAn)- (5) 
The further simplification will be made, again without loss of generality, of putting, for 
allr, A,,. = —n, A’? = —n7 (for true values of the 6’s). This enables us to write, in place of (3), 
) 1 
2(L%—L) = 1, L, + ~ Aras 1,1, +o *) LL.+ Dn soa Aratulplslyly, 
1 
a Awe Ll, LL, ec n4 Arsulp lst, liu BL 3n' sthk Lt 53 a! rlelales (6) 
2 ignoring terms of more than the fourth degree in the l’s. A few words of explanation are desir- 


r i 


able at this point. While the notation A,, is unambiguous, the meaning of A”* depends, in 
general, on the order k of the matrix which is being inverted. Nevertheless, if at the outset 
we choose the 6’s such that A,,=—n and A,,=0 (r+s) for 7r,s=1,2,...,p+q, then 
Av = —n-! and A’* = 0 whatever the value of k. We can, for example, make the choice of 











298 Distribution of likelihood ratio criteria 


6’s by means of a certain linear transformation. If the original set of @’s do not satisfy the 


above conditions we may replace each 0, by 6; = a,,9,, with suitable values of the a,, 
8s2>r 


By this means we obtain a set of ’s which makes (6) true for all k. These remarks are relevant 
to the definition of the quantities m? in the next section. 


4, JOINT CUMULANTS OF THE m, 
Now define quantities m3, m3, ...,m%., by 
mi = 11 —L), 
m? = 2ALO—L’-») (r>1). 


k 
Then 2(L”—L) = ¥ m?, 
r=1 


pt+a 
and the criterion in which we are interested may be written as >) m?. It is readily found 
r=p+l1 
that m, is given (with suitable choice of sign) as far as terms of the second degree in the l’s by 


m, = Uy + earl + argh Be dralet gg BE hvala 5 a iet Elbe (1) 


where we drop the summation convention, and continue to do so until further notice. 

We now establish certain results concerning the joint moments and cumulants of the m,,. 
We shall denote these by /4,, My., My, --- (with or without dashes) and by k,, k,,,K, ,---. In 
our notation /,, = K,, is, for example, the covariance between m, and m,. For the mean of 
m, we have 


ly 
KN = [yn == Ay > Ane+5 ~{- Neve + On} +s mL Ares + Ara)} + O(n). 
Hence Me = n-H = SA ny + 3(A,,),} + n-4 x { aa Rw + (A,s)s} + O(n-4). 
8<r 
k 
Since the expectation of ¥ m? is k+e,+O(n-*), where e, is given by (5) with A,, = —n, 
r=1 


we have yu), = 1+¢€,—€,_,+O(n~*). It is unnecessary for us to evaluate yu}, (r +8). We need 
only observe that both j,, and 4,, = K,, are of order n-. 
For the third-order moments and cumulants we have, first, that n4y/,, is the expectation of 


B+ =, waAaelt + sgl & Arnal so mat z Dalal 3 > 21, ‘42 BEd ¥, 


8°Ts 


and thus Hore = WU —A,,, + HA,,),} + 3n-# D {— FA + (Apg)g} + O(n-#) 
= 3u)+O(n-4). ~ 
Similarly we find Long = e+ O(n-4) (r+), 
Mig = O(n-4)  (r,8,t all unequal). 
Hence, for all values of r, 8 and t, Kya = lpg = O(n-4). 
We must next show that the fourth-order cumulants «,,,, and x,,,, are of order n-?, The 


method employed for doing this involves exceedingly complicated and laborious algebra, 
and has the disadvantage of making the final result appear miraculous! We have not, 





ee ee a ee 


— oo cael a at 





ant 


need 


on of 


The 
ebra, 
not, 





D. N. LAWLEY 299 


however, been able to discover a better method. We shall find the covariance of >: m2, as 
i=1 
given by (6), with = m2 (r > k),a similar expression in which, however, all suffices are summed 


over the values 7 to r. We must retain all terms of order n-!. It is hardly desirable to re- 
produce all the algebra, but it may help to give a few illustrations. We reintroduce the sum- 
mation convention with the understanding that the suffices g, h, i and j are to be summed 
from 1 to k, while the suffices s, t and u are summed from 1 tor (>k). Four of the terms which 
arise are: 


1 7 
n2 cov {(1,1,), (J; i,)} 
1 
= 2k + Prat = BA gsi i 4(A,.:); a A(Ai:)s aa (Ags) vis (Aia)ss ea 4(A,;) ; + K(s) (ii) + 2K(sin(ei)}> 


cov {(1,1,), (l,,1,)} 


Sn Aohi 


1 1 
_ n Agnil4Agni a 6(A,y,)} tr n3 Viisf2A oj “+4 2(Az;)s Pe (Ags);}, 


1, 
ni Aonj cov {(l, 1,), ( (J geal i L,;)} 


4 
sae Aan + Noni + (Agi )it + ne Aagit wi Anni a (Ansat 


1 
395 tatu COV {(lslsly), (1,1;1;;)} 
b 
3 Meath — Avia + (Asa)at + 5 Asast — Avag + (A ij +5 click Asig + (Aig) sh 


There are altogether nineteen such terms. Their sum is found to reduce to 


2k+—% n2 aes 4(Aii5); + 4(A;5)i3} + “(By ri “ghi + Aggi Anni 
— 4A gnilAgi)n — 4AggilAnidn + 4(Agidn (Anidg + 4(Agidg (Ansda}- 


Hence the covariance between . m2? and 4 m? (r>k) is 2k+4e,+O(n-*). From this it is 
s=1 i=1 


easily deduced that 
var (m2) = 2+4(¢,—¢, )+O(n-), 


cov (mz, ms) = O(n-*)  (r +8), 


and that More = 32 +O0(n-), — k, 


rrrr 


= O(n-), 
Pores = Her Hog + O(N"), Kyygg = O(n-*) (v8). 


Since the leading term of m, is 1,/,/n, it is easy to show that all other fourth-order cumulants 
are certainly O(n-) or less. Similarly, regarding the cumulants of higher order we need only 
remark that those of the fifth order are not more than O(n-4), those of the sixth order not 
more than O(n-*), and so on. 











300 Distribution of likelihood ratio criteria 


5. APPROXIMATE DISTRIBUTION OF THE CRITERION 


pt+q 
Now consider the expectation of [] (mi), where the p; are non-negative integers, and 
i=p+1 
suppose that it is expressed in terms of the joint cumulants of the m;. Then, bearing in mind 


the orders of magnitude of these cumulants, it is clear that the only terms which matter are, 





)! 
first, that in [] («f}), which has a coefficient of [] oe | , and, secondly, those in which 
i i "Pi: 


one of the powers x?! is replaced by «é/-1k?, with a coefficient p; times as large. All other terms 
are O(n-*) or less. Since yw}; = K;;+«?, it follows that, neglecting quantities of order n-*, 
the expectation of [] (m?:) is given by 

i 


ne 





In view of this it is clear that, to the same order of approximation, the moments of the 


criterion "mt are the same as those of 
i=pt+ 
, 41% + Op49%— + ++» FOp igh 
where a; = yj; = 1+€;—€;_,, and where 2, 2, ...,z, are independent x” variates each having 
one degree of freedom. So the pth cumulant of the criterion (assuming n large compared 
with p) is ses 
2°-(p— 1)! a ay ti 2°-\(p Ti 1)! {9+ P(Epig— )} + O(n-*) 


i=p 
l p 
= 2P-l(p—1)!q ( + q (Epig— ») + O(n). 
Hence, finally, either 2q(LP+® — L)/(q + €44—€p) 
or 2f1 a ; (C16 sil ey) (Le+o — [) 


has the same moments as x? with q degrees of freedom, neglecting quantities of order n-*. 

Though the quantity ¢, is given either by (4) or by (5), these expressions would not as 
arule be of much practical use, particularly if the number of unknown parameters were large. 
In particular cases it would generally be much easier to find q+¢,,,—€, by evaluating 
directly the expectation of the criterion as far as terms of order n~!. This method was 
adopted in a previous paper (Lawley, 1956) when testing hyp theses regarding the latent 
roots of a covariance matrix. It can be shown that both the hypotheses and the criteria 
employed to test them were of the same type as those considered in this paper. In general 
€yiqg—€, Will be a function of the unknown parameters, but the substitution of estimated 
values for these will clearly not affect,the order of the approximation. 

We end with three examples. In both of the first two the criterion belongs to the class 
considered by Box, and the results are not new, but our main object here is to provide some 
slight verification of expression (5). 

Example I. Suppose that we have a random sample of size n from a normal population. 
We consider the well-known problem of testing the hypothesis that the mean 0, has a 
specified value, while the variance 0, is unknown. In this case 


L=- Imlog 0, — 55 {m(@—0,)+ (n — 1) 8}, 








f the 


ving 
jared 





D. N. LAwLey 301 


where Z is the sample mean and s? is the usual estimate of 0,. The criterion is 


#2 
()_ TM) — ts 
2(L@ — L%) nlog(1+—.), 


where t = (—0,) ,/n/s. 
Differentiating L we find, omitting various zero quantities, 


n n 


Be @ | hn ae 
11 202 22 


n n 
Ai22 _ a a (Azg)1; A122 =e ~ B me (Aj22)1- 


Since A,, = 0, we are able to use (5) and hence obtain 1+¢,—¢, = 1+3/(2n), which is the 
right value for the expectation of the criterion. The corrected criterion is therefore 


: 2 
: ny ae opens 
Xa = (n Dlog(1+-"5), 





1+#@ 30+ 4¢# ) 


which may be expanded as e(1 “ie 12p2 


where v = n—1. This happens to give the 5 and 1% significance levels fairly accurately 
even for n = 6, and it agrees, as far as the term in v-!, with the correct expansion, which is 
known to be 





1+ 3+ 7+ 8% 
#2( 1 — 
( > * ) 


This may be obtained by inverting the expansion given by Fisher (1941). 

Example II. As a second example, suppose that we have k independent variance estimates 
s? with expectations o? and degrees of freedom n, (i= 1, 2,...,%). Then (assuming normal 
parent populations) we may take L to be 


k 
with > n,flog (7?) + 83/07}. 
21-1 


We assume that the n, are all of the same order of magnitude and that quantities of order 
nz? can be ignored. The hypothesis H, to be tested is that of = of = ... = oj}, and our 
criterion is 2([ — L™), where L™ and L™ have to be determined. 

For Z® it is convenient to define the parameters 0; by 


0,=03, 0,;=0%-of (t>1). 
Then, assuming H, to be true, the true values of 0, ..., 0, are all zero, and L becomes 


— 4n(log 0, + 87/0,), 
k 
where n= = n;, 8° = ¥ (n,87)/n. 


Maximizing this with respect to 0,, we obtain 


L® = — 4n{log (8?) + 1}, 6, = 8%. 








302 Distribution of likelihood ratio criteria 
Differentiation ef /. with respect to 0, gives 
n 2n n 
Ay = 203” Ayn = 8’ (Au) = 63” 


Anu = a (Ayu)1 = 6 » Anda = ~F 


Hence use of (5) gives the expectation of 2(L® — L) as 1+e¢, = 1+ 1/(3n). 
Since L is the result of maximizing LZ with respect to all the parameters it is in this case 
more convenient to define 0; as 07, for all i. We then have 


L® = —} > n, log (s3) — 4n, 
a 


2n; 

and also Au=—-sm Au= qm, ete. 
v 262 vie 7 

Hence, from (5), the expectation of 2(L — L) is 


k+e, = k+43 (7). 


a 


Thus the criterion, which is n log (s?) — > n; log (s?), 


has an expectation of (k—1)+ ; (x - = ). The y? test is improved by using 
iy 


1 | ee 
1+ 51) Ena) 
as a divisor for the criterion, as first established by Bartlett (1937). 

Example III. Lastly we consider the hypothesis that the correlation coefficient p (= 43) 
in a bivariate normal distribution has a specified value, the two standard deviations 0, 
and 6, being nuisance parameters. This example was considered by Bartlett in the third 
of his papers, already referred to, on approximate confidence intervals. We shall take the 
approximation a further stage. If, s, and s, are the usual estimates of p, 0, and 0, respec- 
tively, obtained from a sample of size n + 1, then, omitting a constant, we have 





n 8? 2p0rs,8. 8 
L = —nlog (6:8. P)} 37x, (ge aaa. 


Our criterion is easily found to be given by 


ot 2 
2(L° — L®) = nlog Fret eere : 


In this case it would be exceedingly laborious to calculate the expectation by differentiating 
L, and we therefore find it more directly. Expansion of the above expression in powers of 
« = (r—p)/(1—p?) gives 


nlog {1 + 2? + 2px + (1+ 3p?) at+...} = nf? + 2pa3 + $(1+ 6p?) at+...}. 
Making use of results obtained by Hotelling (1953) we have, neglecting quantities of order 


a3 
, 1 2392 15p 3 
E (a?) = 5° aa 9 E(x?) = on?’ E(a*) = ~ 














case 


= 9s) 
ns 6, 
third 
e the 
spec- 


ating 
rs of 


order 





D. N. LAWLEY 303 


Hence the required expectation is 1 4-(6—,?)/(4n)+O(n-*). For an improved criterion we 
use, in place of n, the multiplying factor n — }(6 —p?). 
We may expand in powers of z— ¢, instead of x, where 


l+r 1+ 
z= plog 7 —. ¢ = blog 7". 


To the usual order of approximation we then have that 


{n—3(6 —p*)} {(2—£)?— (z—6)} 


is distributed as y? with one degree of freedom; and it may easily be verified that the variate 
(z—¢) —5(z—¢)8 has cumulants given by 


eae: 


k= £4 O(n), ky =~ +55 (3—p2) + O(n), 


2n? 
Ks = O(n->), Ky = O(n-*). 


This could be used to provide confidence limits for p, but no practical improvement on the 
ordinary use of z for this purpose would be achieved. 


My thanks are due to Mr D. V. Lindley for his helpful suggestions for clarifying the 
argument in various places. 


REFERENCES 


Bartuett, M. S. (1937). Properties of sufficiency and statistical tests. Proc. Roy. Soc. A, 160, 268. 

Barter, M. 8. (1953a). Approximate confidence intervals. I. Biometrika, 40, 12. 

BaRtTLeEtTT, M. §. (19536). Approximate confidence intervals. II. More than one unknown parameter. 
Biometrika, 40, 306. 

Barttett, M. S. (1954). A note on the multiplying factors in various x* approximations. J. R. 
Statist. Soc. B, 16, 296. 

BarTLett, M.S. (1955). Approximate confidence intervals. III. A bias correction. Biometrika, 42, 201. 

Box, G. E. P. (1949). A general distribution theory for a class of likelihood criteria. Biometrika, 
36, 317. 

FisHEr, R. A. (1941). The asymptotic approach to Behrens’s integral, with further tables for the d test 
of significance. Ann. Eugen., Lond., 11, 141. 

Horetiine, H. (1953). New light on the correlation coefficient and its transforms. J. R. Statist. 
Soc. B, 15, 193. 

Law ey, D. N. (1956). Tests of significance for the latent roots of covariance and correlation matrices. 
Biometrika, 43, 128. 

Neyman, J. & Pearson, E. S. (1928). On the use and interpretation of certain test criteria for pur- 
poses of statistical inference. Biometrika, 20A, 175 and 263. 

Wis, 8S. 8. (1938). The large-sample distribution of the likelihood ratio for testing composite 
hypotheses. Ann. Math. Siatist. 9, 60. 











[ 304 ] 


ON THE ACCURACY OF WEIGHTED MEANS AND RATIOS 


By G. 8. JAMES 
University of Leeds 


1. IytTRODUCTION 


The basic problem we consider is as follows. Suppose 2,,...,x;, are k quantities, derived 
from observations, which are independently and normally distributed about the same 
mean 4, but with possibly different variances A, 0?, ...,A;,0%. The positive constants A, are 
known. The o? are unknown, but estimates s? are available which are distributed inde- 
pendently of each other and of the x; in the usual mean-square forms with v, degrees of 
freedom (v; known). We require to find confidence limits for ~. For example, the x; may be 
the means of k samples (sizes 7,,...,”;,) drawn from normal populations having the same 
mean 4, but perhaps different variances o?, ..., 07. The s? are then the sample variances 
defined with degrees of freedom v;=n,;— 1, and the A, are the reciprocals of the n,;. Again, 
the x; might be the estimated slopes of k regression lines, the true lines being known to have 
the same slope , but the residual variances a? being possibly unequal. The s? are now the 
usual estimates derived from the deviations from the fitted lines, while the A; are the 
reciprocals of the sums of squares of deviations of the independent variate. 

Let w,; denote 1/(A;o?), the reciprocal-variance or weight of x;, and let w; denote 1/(A;8?). 
Write w for Xw,; and w for Xw,;. Now if the true weights w, (or their ratios) were known, then 
the quantity % = Xw;2x,;/w would provide an estimate of « of maximal accuracy (its variance 
being w~'). It therefore seems reasonable, in the absence of any firm knowledge of the true 
weights, to use the estimate 2 = Xw,z,/w in its place. Now although w1(%—z) has a standard 
normal distribution, that of «= w+(#—j) has a form which depends on the ratios of the w,, 
which are unknown. Thus tables of this distribution, even if available, would be of no 
assistance in finding confidence limits for ~. Nevertheless, it may be possible to find a 
function u(r,,...,7;,,), or u(r) for short, of the ratios r;=w,/w (Zr;=1), which is such that 


Pr[|u|<u(r)|=P, (1-1) 


either exactly or approximately, where P is the required confidence coefficient. Confidence 
limits for ~ are then 2+ u(r)/wt. Of course u(r) will depend in addition on P and on the 
degrees of freedom p;. It is not known for certain whether the functional equation (1-1) does 
or does not possess an exact solution, but in this paper we present a solution which is asympto- 
tically correct when all the degrees of freedom are large. More specifically we give a function 
u(r) for which Pr{|w| <u(r)]=P+0(0-4) (1-2) 
and tabulate it for the case k= 2. 

The ordinary large-sample approximation would be to take u(r) equal to the appropriate 
point of the standard normal distribution. This is also asymptotically correct, but satisfies 
(1-2) with v- in place of v~*. 

We also show how the tables of this paper, and certain others originating from work by 
B. L. Welch, may be used to provide confidence limits for certain quantities which are 
estimated as ratios. 





a a ee 


— 


—_ aatet@ 42a fn 4 fel 


a aie a 





ived 
ame 
; are 
nde- 
2s of 
y be 
ame 
nces 
rain, 
lave 
r the 
the 


48). 
then 
ance 
true 
dard 
e W;, 
f no 
nd a 


at 
(1-1) 


lence 
1. the 
does 
\pto- 
ction 


(1-2) 


riate 
is fies 


k by 
1 are 





——————— eee nn eee = 


-— — -—- - 


G. S. James 305 


2. GENERAL DISCUSSION: YATES’S SOLUTION 


In a sense the present problem is dual to that discussed by Welch (1947a, b) and by Trickett 
& Welch (1954) (see also further references below). Their problem was that of finding con- 
fidence limits for a previously specified linear combination, 7==f;,, of k quantities yu, 
(not here assumed equal), the assumptions being otherwise the same as before. (In parti- 
cular, confidence limits were desired for the difference between the means of two normal 
populations, whose variances were not assumed to be equal.) More precisely, if y= Zf;2;, 
a= Xa,;=Xf?A, 53 (the estimated variance of y), c;=a,/a (Zc; = 1) and v=a-4(y—7), then the 
problem was to find a function v(c,, ...,¢;,.)=v(c) which was such that 


Pr[|v| <v(c)] =P. (2-1) 


Functions v(c) have been given’ which satisfy this equation within terms of order v-* 
(Welch, 1947a, b), and within terms of order v-> (Aspin, 1948). Tables (for the case k= 2) 
have been given by Aspin (1949) and by Trickett, Welch & James (1956) (see also Pearson & 
Hartley, 1954). 

Now a different solution to the problem of finding limits for the difference of two means 
was proposed by Behrens (1929). This has been rederived by Fisher (1935, 1939, 1941) and 
Yates (1939) using the theory of fiducial distributions of population parameters, and by 
Jeffreys (1940) using an inverse probability approach. Tables (again for the case k = 2) have 
been given by Sukhatme (1938) and by Fisher (1941) (see also Fisher & Yates, 1953). The 
argument using fiducial distributions is as follows. Let 


t, = (x, —m,)|(A$s,) = B(x, —n,)/a4. (2-2) 


Then the t; have independent Student distributions with v, degrees of freedom. Equation 
(2-2) is equivalent to 
[p= 2, —a4t,|B;, (2-3) 


and so n=y— Xabt,. (2-4) 


It is now asserted that given a wnique set of values x; and s?, (2-3) gives the distribution (in 
the fiducial sense) of the parameters y;. That is to say, if t;» denotes the value of | ¢|, based 
on v; degrees of freedom, which is only exceeded with probability (1—P), then the fiducial 
probability that , lies between the (fixed) limits a; + a}t,p/f, is numerically equal to P. 
That is to say the fiducial distribution of 4; is obtained from that of ¢,, using (2-3) and 
treating x, and a; as constants: it is merely a transformed ¢-distribution. Likewise the 
fiducial distribution of 7, given the unique samples, is said to be that of the linear com- 
bination of ¢-variates given by (2-4), with y and the a; treated as constants. Now 


v=aty—4) = Xe}t,. (2-5) 
Thus if d=d(c) denotes that number which is such that (for constant c;) 
Pr{[| Ue}t, | <d(c)|c]=P, (26) 


then it is asserted that Fp [|| =a-4| 9-y| <d(c)]=P, (2-7) 


20 Biom. 43 














306 Accuracy of weighted means and ratios 
where ‘Fp’ denotes the fiducial probability} of the relation following. This is a fiducial 


statement about 7 given a unique set of values x; and s?. It is not asserted, and it is not 
true in general, that 
Pr[{|v|=a+|y—7| <d(c)]=P, (28) 
where this statement is interpreted in the ordinary way (7 a constant; y, a and the c; having 
their ordinary direct probability distributions). It is held by those who use the fiducial 
argument that the fact that (2-8) is untrue is completely irrelevant, and that when, as in 
many other procedures based on normal distributions, statements equivalent to (2-7) and 
(2-8) are both true (each with its own interpretation), this is a confusing accident. 
However, this does not seem to be the place to argue about the two points of view. My 
purpose in mentioning the fiducial argument is to recall that Yates (1939) showed that it 
leads to a solution of the weighted mean problem which can be applied using the same tables 
as are needed for the Behrens—Fisher problem.{ For if the 7; can be regarded as constants, 
a u=wh(—p) = drt; (2-9) 


has the same distribution as v of (2-5), except that the quantities c; are replaced by the r,. 
Thus 
Fp[|w|=wt|~—-2| <d(r)]=P. (2-10) 


This equation gives limits, in Fisher’s fiducial sense, for the parameter ~. The duality 
between the original Behrens—Fisher problem and the weighted mean problem with k = 2 
is particularly striking, because with £, and #,= +1 we have r,;=c, and r,=c,. In the 
Sukhatme-Fisher tables, where the quantity 6=sin-'c} is used as an argument instead 
of c=c,, the change merely amounts to using (47 —@) in place of 0. 

If confidence limits in the ordinary sense of equation (1-1) are required, the tables of 
Welch’s v(c) cannot be adapted in a similar manner, and completely new ones, given later, 
are necessary. Even with the two sets of tables available the duality (if indeed that be the 
correct word) is very one-sided; for the u-problem is fundamentally more complicated 
than Welch’s v-problem. This is because in the latter the combination occurring in the 
numerator is £,2, + /,%», the £; being constants, whereas in the former it is w, 2, + w2%, the 
w, being random variables. In fiducial theory the w; are also (at a certain stage) treated as 
constants, thus leading to an almost complete duality between the two problems. 


3. THE ASYMPTOTIC SOLUTION 


We suppose that we have quantities 2,,...,7, which are normally distributed with the 
same mean 4 but with possibly different variances «,=1/,,...,«,=1/,. Also that 
a, =1/wy, ...,d;,= 1/w, are estimates of the «,, v;a,;/x; being distributed as x? with v; degrees 
of freedom, the v; being known positive integers. The x, and a; distributions are supposed 


+ I am assuming that an expression such as the left-hand side of (2-7) is capable of a unique inter- 
pretation. Fisher has laid great emphasis on the theoretical necessity of defining the fiducial distribu- 
tions of parameters in a unique fashion, but even when one only makes use of sample statistics jointly 
sufficient for the parameters, it sometimes seems to be possible to produce more than one fiducial 
distribution for the same quantity. The interested reader may consult Creasy (1954) (including the 
Discussion held at the end of the Symposium), and Mauldon (1955). 

t As in the solution discussed later, relatively small amounts of information on o%, ..., o2 contained 
in the contrasts between 2, ..., 2, are neglected, this being not theoretically in accord with the 
conditions laid down by Fisher for the derivation of fiducial distributions; but in practice this can be 
neglected for the time being. 








h the 

that 
grees 
posed 

inter- 
stribu- 
jointly 
ducial 
ng the 


tained 
th the 
can be 





G. S. JAMES 307 


all to be mutually independent. No prior knowledge of or the a; is assumed, and it is 
required to find confidence limits for ~ with confidence coefficient (asymptotically) equal 
to P. Let = Xw,2,/w, where w denotes Xw;. Then the problem will be solved if we can find 
a function u(ay, ..., a.) = u(a) of the sample variances (and of P and the v,;) which is such that 
papenpotieny) Pr[wt | @—p| <u(a)]=P. (3+1) 
As remarked in §1, u(a) actually depends only on the ratios of the sample variances or 
weights, and can be written u(r), where r;=w,/w, but the above form is more convenient 
for present purposes. It must be remembered that both sides of the inequality in (3-1) are 
random variables, and that w and 2 depend upon the a,, as well as u(a). 

The method of solution is basically that due to Welch (1947a). It turns out to be possible 
to express u(a) in the form 


U(A) = Up(A) + Uy (4) + UQ(4)+..., (3-2) 


where each u,(a) is of order v~, and where, if &{_)w,(a) is written in place of u(a) in (3-1), the 

equality is satisfied to within terms of order v1. Large-sample theory gives the initial 

term (a) = yx, where . 

(mya e* di=P. (3-3) 
-2 


The first corrective term can be derived by making the appropriate substitutions in the 
general theory presented in a previous paper (James, 1954, equation (2-53) or (2-59)). It is 


2 : 
ty(4) = (8 — Tx) BE + VE, (3-4) 


where 7;=w;/w. The present problem is an example of what was there called the ‘general 
case’ of a linear hypothesis with variances requiring separate estimation; that is to say, 
the estimate 2 of uw is itself functionally dependent on the variance estimates. This makes 
for analytic complication, and only this first corrective term was given for the general 
problem considered there. (For the ‘special case’ where the estimates of the parameters 
in the linear hypothesis are functionally independent of the variance estimates a second 
correction term was given.) However, for the particular problem of weighted means the 
second and third corrective terms are manageable, although very lengthy calculations are 
involved. (Mrs Aspin recorded that the algebra for the fourth corrective term in Welch’s 
v-problem ran to more than 100 pages; anyone wishing to tackle the same term in the 
present problem should be prepared for considerably more work than this.) 

The following is a brief summary of Welch’s computational technique. The left-hand side 
of (3-1) can be written as the average, over the distributions of the variance estimates a,, 
of the conditional probability that wt | #— | < u(a) for fixed a;. That is to say, our equation 
(3-1) for u(a) can be written schematically as 


fee [wt | @—| <u(a) | a] Pr [da]=P. (3-5) 


The left-hand side can be expressed as a series containing terms of order 1, v', v-®, ..., and 
depending upon Pr[w?|%—|<u(«)] (where % denotes Lw;x,/w) and its derivatives with 
respect to the a;. (The a, involved parametrically in the distribution of & are regarded as 
constants in evaluating these derivatives, but not those involved in the expressions w, X and 


20-2 











308 Accuracy of weighted means and ratios 


u(a).) The series (3-2) is now substituted for u(«) and terms of corresponding orders on both 
sides are equated. There results the following set of equations for u(~), u,(a), ..., and hence 
for uo(@), u,(a), ...: 


Pr[wt|%@—p|<ug(«)J=P, giving u(«)=u,(a)=x, (3-6) 
[w,(a) D + Xa? 2/v,] Pr [wt | Z—w| <x] =0, (3-7) 


[w(a) D + 4u?(a) D? + (Xa? 03/v;) u,(a) D 
+ $303 03/1? + 4200302030? /v,v;] Pr[wt|%—p|<x]=0, (3°8) 
[u3(%) D + W(x) w,(a%) D? + 2u3(x) D® + (Xa? 03/v,) (ug(e) D + 4u2(x) D?) 
+ ($2003 04/v? + 4X3 x3 Oo aa V;) Uy(%) D+ 2204 Of /v} 
+ $2078 02 03 02 /v2 v, + 4Xoc3 a3 a? 0203 2 /v,v,7,] Pr [wt |Z—w|<x]=0. (3-9) 
Here 0; denotes 0/d«; (interpreted in the sense indicated above) and D denotes 0/éx. In a 
term like «?0?u,(a) D Pr[...], 0? operates on u,(«) and on Pr[...], but not on the initial «?. 
The derivatives with respect to the a; are most readily evaluated by the Taylor expansion 
technique used in my 1954 paper. After evaluating all the derivatives involved in (3-7), 
(3-8) and (3-9) and performing the summations we finally find the result 
ua) or u(r) =x{1 + A(x? —7) Ug + 203} + (x4 — 312 + 93) Ugg 
+ (11x? — 61) Ugg + 8U 2 + ds( — 9x4 + 208? — 543) UZ, 
+3(—x2+ 7) Uy Uy — 205} 
+ {h(5y° — 4174 + 4957? — 8997) Uj, 
ne — 6432+ 1668) Ugg + 11(7x2— 33) Ug + 3205 
“ — x'+ 31x? — 93) U3. U1 + 3( — 99x* + 1984x? — 4677) Up. Uo 
— 11x? + 61) U,2U,, + 6( — x? + 7) Uy. Ug, — 16,2, 
sa(24 “ag¢ — 12053y4 + 105569? — 153063) UZ, 
(9x4 — 208? + 543) U3, Uy, + 42x? — 7) Up, U9, + 4033) + O(v). (3°10) 


i s.+ +? 


: x 
3( 
oY 
38 
i5(9 
16 


where U,=Xr3/v4,  r,=w,/w. (3-11) 


As a check on the lengthy algebra involved the calculation was repeated, but instead of 
starting with (3-1) the equivalent equation 


Pr [w(%— 4)? < 2h(a)|=P (3-12) 
for 2h(a) =[u(a)}* was used. The zero-order approximation to h(a) is § = }y?, where 
a tte4+dt=P, (3-13) 
0 


and writing h(a) = £ + h,(a) + ... we complete the solution as before. Finally, u(a) is evaluated 
as the square root (to order y-8) of 2h(a). 











oth 
nce 


+10) 
+11) 


id. of 


3-12) 


3-13) 


ated 








G. S. JAMES 309 


4, TABLES FOR THE TWO-SAMPLE PROBLEM 
In the case k = 2, u(r) is a function of the single quantity 
Wy, 


f‘=7,= ‘ 4-1 
1 w+,’ =) 





apart from v,, v, and P. 

The values for (1—P)=0-10 were worked out on an ordinary desk computer, using 
equation (3-10). These values were checked, and the values for (1—P)=0-05, 0-02 
and 0-01 were worked out on the automatic digital computing machine at Manchester 
University. 

The results have here been tabulated after rounding off tc two decimal places. For 
infinite degrees of freedom tables of the normal distribution show that a rounding-off error 
of 0-005 implies a change in probability varying from about 0-002,1 when (1— P)=0-10 to 
about 0-000,29 when (1—P)=0-01; that is, to put the most unfavourable light on it, the 
maximum error due to this cause is less than 3 °% of (1—P) for the values tabulated. Since 
the values of u(r) are always at least as large as the values with infinite degrees of freedom, 
rounding-off errors are likely to be even less serious in general. Moreover, when the degrees 
of freedom are only moderate a very considerable averaging effect takes place, for r varies 
from sample to sample and some tabular values are likely to be too small and others too 
large. For example, in an Appendix to Mrs Aspin’s tables (1949) for the v-problem, Welch 
shows that, in the particular case (1— P)=0-10, vy, =v,=6 (our notation) and equal popu- 
lation variances, the true probability using the two-place tables never differs from the 
nominal by more than about 0-000,4, compared with the bound 0-002,1 suggested above. I 
have not done any similar calculations using the two-place tabular values given here, but 
some details of the performance of the five-place figures from which they were derived are 
given in §6. 

In deciding on the minimum values of v, and v, to be given in the tables, the principle 
adopted was that alterations of at most two or three units shall be necessary in the second 
decimal place, should some improved method of calculation become available in the future. 
Of course guessing ‘what the next term would be’ is a dangerous procedure, but the third 
corrective term is never greater than 0-04; this value occurs when (1 — P)=0-01, vy; =v,= 10, 
r=0-3 or 0-7. A possible future change of several units in the last place has been accepted 
as a compromise between the desire to produce tables which are mathematically correct, 
and ones which shall be practically useful. The standard is lower than that imposed in the 
Welch—Aspin tables, where not only is the term of order v~4 available for evaluation, but the 
series itself is more quickly convergent (in the practical sense of the word). Ifit were merely 
required that the probabilities should be close to their nominal values it would no doubt be 
possible to extend the range (cf. Table 6-1). 


5. INTERPOLATION IN THE TABLES 


For most purposes it will suffice to interpolate linearly with respect to all three arguments, 
although in certain cases this may lead to an error of several units in the second decimal 
place. In cases of doubt, harmonic interpolation should be use? for the degrees of freedom. 
Harmonic interpolation should in any case be used for either v,; in the panel which includes 
¥,=00. 








310 Accuracy of weighted means and ratios 


6. THE ACCURACY OF THE SOLUTION 


The performance of the solution (3-10), or of any other proposed solution, may be checked 
by numerical integration. This process could no doubt also be made the basis of an iterative 
method of calculating u(r), like that described by Trickett & Welch (1954) for the dual 
problem. The integral to be evaluated can be derived as follows. 

The quantities 7? = v,;,/w,; are distributed independently of the «; and of each other in 
x?-distributions with v; degrees of freedom. Write 


=U, O=XIZXT (2b;=1). (6-1) 


Then x? has a x?-distribution with v, = Xv; degrees of freedom, and is independent of the 
distribution of the b;, which is 





p(b) db = a eh Tb} 111 db, (by, ...,b, > 0; Eb,=1). (6-2) 


II denotes a product over i=1,...,%, and II’ a product over i=1,...,(k—1) (say). We easily 
find 








wl — pp) me OP) _ Drath — 1 
wt (x?Xv,0;/b;) 
Ly, w,(x;— )/B; (x? 4) [ Zv3p,/b3 78 : 
= 5 ii — = (6 3) 
(Zv3.w,/b3)t Vy Vy UY; p;/0; 
where p;=,/0, XUp,;=1. (6-4) 


Now the conditional distribution of the first factor in (6-3), given the b,, is that of ¢ with 
vy degrees of freedom, by the usual rule. (Its marginal distribution is the same, but we make 
no use of this fact.) We also have 


Vs pilb; ) 
ur) or u(r;)=ul5——; }. 6-5 
) ") (507 —_ 
Therefore Pr [wt | #-p| <u(r)] = | Prtwt |@—p| <u(r) | b] Pr (dd) (ef. (3-5)) 
ig remedy Vipi[dy . 
=|. [("Sseecr) “(sip |?® ~ 


where F denotes the ordinary t-integral: 
tiv» 
Bi) =(BG, WA] e+ 1a, (6-7) 
—t/Vv 


The integral in (6-6) is a (k—1)-fold one over the distribution (6-2). In the particular case 
;=2 (6-6) may be written 


Le, [ (%4bb' (vy pb' + v A) v, pb’ ) bbl pb 
F i ect 8 pital lb 2 WP lan ai ot 68 
} ( . (a +v,p'b} | B(4v,, $¥2) - om 





é 





>cked 
‘ative 
dual 


er in 


(6-1) 


f the 


(6-3) 


(6-4) 
with 
nake 
(6-5) 


3°5)) 


(6-6) 


case 


(6-8) 





G. S. JAMES 311 


where p=,/o=(1—p’) and b’=(1—b). The integrals (6-6) and (6-8) simplify in various 
ways for equal degrees of freedom, and for p; oc 1/v;. For example, when 


k=2, V¥y=Vg=V, p=}, 


; 1 2bb’ \t _ | (bb’) 4 
(6-8) becomes ¥ F, (sepa) u(b | ap. aa (6-9) 


The simplification in the argument of u(r) is of great assistance in its numerical evaluation, 
for if a ten-panel integration formula is used no interpolation is necessary in the tables of 
u(r), and interpolation is of the simplest kind with twenty panels. 

The results recorded in Table 6-1 have been obtained in the case k=2, using the five- 
place values of u(7) from which the two-place tables were compiled, and also the lower-order 
approximations correct to v~?, v-!, 1. The last-mentioned is the ordinary large-sample 
approximation. 








Table 6-1. Actual values of 100(1—P) compared with nominal values 























| Nominal value of (1—P) 
Order of ae 10°/, 1°/, 
approximation | 
in the 1/v; | (vy; =,= 8) (v,=V_= 10) 
| | 
| | | | 
| p=0orl | O2or08 | 0-5 p=0orl | 0-5 
ae | Girne te ea Wirt o op abies mls e 
| | 
Zero | 13-86 | 15-87 | 16-54 | 2-76 | 3-33 
First | 10-38 | — | 10-95 1:19 | - 
Second =| 10-63 — 10-07 | 1-02 — 
Third | 10-00 | 9-99 | 10-02 | 1-00 | 1-01 
| | 





The calculations for the two-tailed 1° point with v,=v,=10 and p=0-5 were carried 
out because of the large value (about 0-04) of the third corrective term, w,(), in the region 
r=.0°3-0-7. It is seen that the actual probability is 1 ° for practical purposes, even although 
the large-sample procedure would give rise to a probability over three times as great. 


7. CONFIDENCE LIMITS FOR RATIOS 


Finney (1950, 1952) has given some useful extensions of a method which was first given in 
its general form by Fieller (1940). These results all have to do with finding limits for ratios. 
Fieller’s procedure is as follows. Suppose that y and z are normally distributed, unbiased 
estimates of 7 and ¢, and that their variances are Ao® and A’o? and that their covariance is 
\"o?; A, A’, A” are known constants, but o? is unknown and is independently estimated by s* 
based on v degrees of freedom. It is required to find limits for “= 7/{ from the data. Now 
(y — z)/s(A — 2A" + v2A’)t has the t-distribution with v degrees of freedom. Thus if t denotes 
the value only exceeded (numerically) with probability (1—P), we have 


Pr [(y — wz)? < f89(A — 2nd" + A’) =P. (7-1) 











312 Accuracy of weighted means and ratios 


Thus a corresponding confidence region for « consists of those values which satisfy the 
inequality on the left-hand side. The typical case is when z? — ¢?s?A’ > 0, that is to say when 
the denominator z is significantly different from zero (at level (1—P)). In this case the 
region is an ordinary interval, confidence limits for ~ being 


Ye—- #2g2Q” + ts[(z2A ax Qyzd" + yr’) 1 t2s2(AX’ x rt 








Bey a= 22 — t232)’ 
_ m—ga"|A’ + (ts/z) ((A— 2m" + m?A’) —g(A— A”2/r')P (7-2 
= — : ) 
where m=y/z and g=#?)'s?/z?. (7-3) 


But it is important to realize (because similar phenomena can occur in the generalizations 
considered below, although they will not be explicitly mentioned) that if z* — ts?’ < 0, i.e. if 
g> 1, then the limits (7-2) become exclusive; that is to say, the region consists of the two 
parts ~<p, and w> pl, (taking ~,<,.), and does not exclude the value ~=00. It is even 
possible for the discriminant of the quadratic to become negative in this case, when the 
confidence region becomes — 00 < yz < 00; that is to say, the data are in agreement (at the given 
significance level) with any hypothetical value of ~. These phenomena are discussed in 
detail by Fieller (1954). The second form of (7-2) is useful because when g is very small it 
reduces to the result that would be obtained intuitively from the evaluation of the variance 
of m using ‘statistical differentials’ (and ignoring the inconvenient fact that this variance is 
really infinite). 

Finney’s generalizations make use of the Behrens—Fisher distribution, and lead to 
fiducial intervals, but not to confidence intervals in the direct probability sense. We now 
consider how these problems can be dealt with using the Welch—Aspin tables of v and the 
new tables of w. 

First suppose that the conditions are the same as before, except that the variances of y 
and z are Ao* and A’o”?, while their covariance vanishes. Independent estimates s* and s’* 
of o? and o”?, based on v and v’ degrees of freedom, are available. Then (y—jz) has zero 
expectation and variance Ao? + ?A’o"? estimated by As? + u?A's’?. Thus 


Pr{(y—p2)< {o(e)}? (As? + 22's"2)] = P, (7-4) 
where c= As?/(As? + ?A's’?) (7-5) 


and v(c) is the value given by the Welch—Aspin tables for v and v’ degrees of freedom and 
probability P. In the case of a well-determined denominator (7-4) gives the confidence 
limits 





m + (v/z) [(1 —g) As? + m2A's’2]8 


he a= l-g 


(7-6) 
where g = v?A's’?/z? and v is short for v(c). Since c, and hence v, depends on //, or //g, as the 
case may be, calculations have to be made iteratively using (7-5) and (7-6) alternately. 
As an initial approximation one could take ~ =m in (7:5). 

Now suppose that 7,/¢,=...=9,/€,=, and that we have unbiased and normally dis- 
tributed estimates ¥,, ...,Y ps 245 «++ Zm OF My, «++, Mes Cr» «++» S,- The variances and covariance of 
Yi, % are A,o?, A;o?, Aj o?, but different pairs are independent. Independently distributed 
estimates s? of the o?, based on v; degrees of freedom, are available. We require to find 





con 
(A; 


the 


wh 


an 
wh 
col 





he 
en 


he 


ice 





G. S. JAMES 313 


confidence limits for ~. Now 2,=y;— 2; has zero expectation and variance estimated by 
(A; — 2A; + #7Aj) s?. Therefore if we define 


WwW, = 1/(A;— 2wdAi +244) 83, w= Dw, (7-7) 

then Pr [(2w,2,)? < {u(r)}? w] =P, (7-8) 
, — A 2X‘) 8? 

where gin! ws MA. 300+ PAA (7-9) 





ww ELL (Ag 2ndG + 204) 89)’ 


and u(r) is the value given by the tables in this paper (or their theoretical counterparts 
when k > 2) for the specified degrees of freedom and probability. Thus the corresponding 
confidence region is that of those values of ~ which satisfy 


1 
i— 20AG + wPAj) 87 








[= Yi— Me | <[u(r)P= (7-10) 

(Ay — 2nAG + HPA;) 83 (A 
This rather complicated inequality, which has to be considered in conjunction with the 
values of the r; given by (7-9), may be of limited practical usefulness, although one imagines 
that when the denominators z; are well determined it would give a single confidence interval, 
whose end-points could be found by an iterative procedure (cf. example 8-3). But a very 


simple case occurs when the constants A;, A;, Aj are proportional, that is to say when 


(A;, Aj, AG) =(A, A’, A") kK; Say. (7-11) 


For example, the quantities y;, z; (i= 1, ..., &) may have arisen from k different experiments, 
of fundamentally the same structure, but with different amounts of replication n,;. Then the 
constants x; are proportional to 1/n;. In this case (7-10) and (7-9) become simply 


(9 — 2)? < [w(r)]2 S2(A — 2a” + 22’), (7-12) 
+, @2 
where huge Sen (7-13) 


S189)? S889? 


and 9, 2 are weighted means calculated with weights 1/x;s?. (The estimates of the variances 
and covariance of 9, 2 are (A, A’, A”) S?.) Ther, are now immediately calculable from the data, 
without any iteration, and in the case when 2 is significantly different from zero (7-12) gives 
at once the confidence limits 





m—ga" A’ + (wS/2) [(A— 2m" +22’) — g(A—A"2/A')}# 


<3 (7-14) 


Le y= 
where m=9/2, g=wA'S?/2?, (7-15) 
and w is short for u(r). 

In his paper, Finney also considers the case where the weights used have been chosen 
without reference to the experimental data. This problem also admits a confidence-limit 
solution, but in this case we must use the Welch—Aspin v-tables instead of the u-tables. It 
is not necessary to choose the same weights for the y’s and the z’s, provided that it is possible 
to assume that 7,=...=7, and ¢,=...=¢,. Thus suppose it has been decided to use 
weights W, (independent of the s?) for the y; and W; for the z;, with 2W, = XW; = 1. Let gand z 











314 Accuracy of weighted means and ratios 


denote the means 2W,y; and XW} z;. Then the result is that the confidence limits for yw (for 
a significantly non-zero Z) are 


m—gV"|V" + (v/2)[(V—2mV" +m2V")—g(V—V"/V') 








Ha b= ‘aa Aa, (18) 

where m= 9/Z, g=vV'/2, v=v(c), (7-17) 

and V==W2A,82, V'=2W;77A,82, V" =xW,W; djs, (7-18) 
7A, —% Wi A" 2W'2 YX") 82 

is a, = kW UM WX + OWEN) 3 pa 


ES (WFA, = 2uW, WAG + OW 2A} se 


Equations (7-16) and (7-19) have again to be solved iteratively, with “=, or fg in (7°19), 
as appropriate. This method seems to have no particular advantage over that using weights 
supplied by the data themselves, except when data from several independent sources are 
being combined, and the A’s are not proportional; in this case the simpler calculations 
might be a determining factor. 


8. EXAMPLES OF THE USE OF THE TABLES 


Example 8-1. Twenty determinations of a quantity using a method of relatively low 
accuracy gave a mean result of 10-31 with a standard error (19 p.F.) of 1-08. A further set of 
ten determinations using a method of higher accuracy gave 11-29+ 0-67. Assuming that 
both methods are unbiased, find 95 °/ confidence limits for the true value. 

Since the standard errors already refer to the means, we have 


W, = 1/(A, 8?) = 1/(1-08)? = 0-857, 
Wy = 1/(Ag83) = 1/(0-67)? = 2-228. 


(0-857) (10-31) + (2-228) (11-29) 


_—_ aa 0-857 + 2-22 





= 11-02, 


0-857 


and eee he 
= 0-857 + 2-228 


= 0-278. 


Interpolation in Table 2 gives u = 2-29. Thus the required 95 % confidence limits are 
11-02 + 2-29/,/(0-857 + 2-228) = 11-02 + 1-30. 


Example 8-2. Finney (1952, § 9-3) quotes data of Gridgeman (1944), obtained in an assay 
of vitamin A, using growth rate in male rats as response. Thirty pairs (litter mates) of rats 
were used, ten pairs for each of three dose levels.(0-9, 1-5, 2-5 units of vitamin A in the case 
of the standard preparation; 0-45, 0-75, 1-25mg. of the test preparation). One of each 
pair of rats was assigned to the standard preparation and the other to the corresponding 
dose of the test preparation. If logarithms of dose (for convenience taken to the base 3) 
are used as dose metameter, and growth in grams over 3 weeks as response metameter, then 
the dosage-response curves become nearly parallel straight lines, and the three dosage 
values are spaced one unit apart. The experiment can be analysed as a typical ‘split-plot’ 
design, ‘whole plots’ being pairs of rats and ‘split plots’ being individual rats. The mean 
difference in response between preparations (test minusstandard)is y= — 5-233gm. Thisisa 
comparison made within pairs of rats, for which the error mean square in the analysis of 





ee ee ee. ee ee ee. ee ed 


eS . 





, (for 


7-16) 


7:17) 
7-18) 


7-19) 


“19), 
ights 
3 are 
hions 


low 
et of 
that 


ssay 
rats 
case 
2ach 
ding 
se 3) 
phen 
sage 
lot’ 
ean 
sisa 
is of 





G. S. JAMES 315 


variance is 39-35 (27 D.¥.). The slope of the dosage-response lines is estimated as z = 9-350gm. 
per 5:3 increase in dose; but this is a contrast between pairs of rats, for which the error mean 
square is 98-60 (27 p.F.), According to the usual assumptions, y, z and the two error mean 
squares are independent. If the true potency of the test preparation is p units/mg., then 
m=y/z= —0-5597 is an estimate of ~=log;), 0-5p (since the nominal potency of the test 
preparation, taken into account when fixing the doses to be administered, was 2-0 units/mg.). 
Thus the true potency is estimated as 


R=2(3)-°597 = 1-50 units/mg. 


Now the estimated variances of y and z are As? = 39-35/15 = 2-624 and A's’? = 98-60/40 = 
2-465. Taking ~= —0-5597 as a first approximation, equation (7-5) gives c=0-77. For 
95 % limits, the first table in Trickett ef al. (1956) gives y=2-02. Thus g=0-115, and 
equation (7-6) gives j= —1-062, tg = — 0-203. 
Substituting these values back into (7-5) gives c, = 0-486, c, = 0-963, or v, = 2-00, v,= 2-05. 
These are only about 1 % different from the original values of v, and in practice it would 
probably not be worth while to carry the calculation further. However, substituting these 
values in (7-6), using v, with the minus sign and v, with the plus, we find the corrected limits 


fy = — 1-055, fig = — 0-198. 


The corresponding 95 °% confidence limits for the potency are 1-17 and 1-81 units/mg. 
Example 8-3. Finney (1952, § 12-5) gives as another example data discussed by Bliss & 
Rose (1940) relating to an assay of parathyroid extract, the response being the serum 
calcium level in dogs injected with the standard and test preparations. The design is a 
fairly complicated one, based on a balanced incomplete block arrangement, and the reader 
is referred to the above authors for full details. Doses of 0-06 and 0-12 c.c./kg. body weight 
of each preparation were used, there being eighteen determinations at each of the four doses. 
For our purposes it will suffice to say that the analysis of variance splits into two inde- 
pendent parts, which will be referred to here as the interblock and intrablock analyses, 
each providing an estimate of potency and its own error mean square. The two parts can 
be regarded as separate experiments for further computational purposes. The first section 
gives, as an estimate of ~=log,.p, the quantity y,/z,, where y, = 4-500, z,= 4-833, these 
having the estimated variances and covariance (30D.F.) A,s}=126-8/6, Ajs}= 126-8/24, 
A{s?}=0. The second section gives the estimate y,/z,, where y,= 3-000, z.=6-417, these 
having’ the estimated variances and covariance (30 D.¥.) A,83}=61-35/12, Ags} = 61-35/48, 
A383=0. It will be seen that the A’s are proportional, and we may take, for example, 


(A;, Aj, AG) = (A, A’, A") k= (ge, de, 0) K;, 
where x, = 2, K,=1 (see equation (7-11)). The means, using weights 1/(«,s?), 1/(K,83), are 
1 2 q g 


9=3-292, 2=6-108. 
Thus from (7-13) and (7-15), 
m=0-5390, r=0-195, S?=49-40. 
From Table 2, the appropriate value of u for 95 % limits is 2-07, so that g = 0-118 and (7-14) 
gives the limits p= —0-151, pp= 41-378. 


The corresponding relative potencies are 0-949 and 1-61. 











316 Accuracy of weighted means and ratios 


Up till now we have neglected a complication in Finney’s analysis. This is the introduction 
of the body weight of the dogs as a concomitant variate in the intrablock analysis. (No 
significant reduction in error variance was obtained in the interblock analysis.) This yields 
the modified values 


Yo= 2-511, z,= 6-270, 88 = 49-74, A,=0-083,93, AZ =0-020,89, AZ =0-000,18. 


The error degrees of freedom in this section of the analysis are now only 29. The introduction 
of the corrections for body weight have caused a slight correlation between y, and z,, and 
the A’s are no longer quite proportional. However, equations (7-9) and (7-10) can be used; 
after some somewhat tedious arithmetic we find the limits 0-943, 1-53 for the relative 
potency (P=0-95). 


9. MULTIVARIATE GENERALIZATION OF THE WEIGHTED MEAN PROBLEM 


The problem considered in § 1 can be generalized as follows. We suppose that X,, ..., X;, are 
k p-variate vectors (for example, the means of k p-variate samples) which are distributed 
independently and normally about the same mean-point (centroid) §, but with possibly 
different dispersion matrices a, ...,,,. The latter are unknown, but independent estimates 
of them (for example, the dispersion matrices of the k samples) are available; these are 
supposed to have Wishart distributions with 1,, ...,v,, degrees of freedom. If W;=a;! and 


W=W, and =W=W,x,, (9:1) 


then it is known that if the v, are large then (& — &)’ W(X — &) has approximately a y?-distri- 
bution with p degrees of freedom. More precisely, making the appropriate substitutions in 
equation (5-28) of James (1954) we find that if we define 


2 
ro 


2 4 
2h=xt+E— J2(p+ WW, + ( x - 2%") te(W-W,)" 








P(p+2)  p 
| i aes Ww.) 2 
+a(spaq 8p) e wow}, 02) 
where [T(4p)}" i the-ledt=P, (9:3) 
0 
then Pr [(X —€)’ W(R —&) < 2h] =P + O(v). (9-4) 


This equation can be used to give an ellipsoidal confidence region for the vector &, with 
confidence coefficient approximately equal to P, if the degrees of freedom pv; are reasonably 
large. 

I would like to acknowledge the assistance of Dr D. W. J. Cruikshank, Miss D. E. Pilling 
and Mr J. F. P. Donovan, of the Department of Inorganic and Structural Chemistry, Leeds 
University, in facilitating the use of the Manchester University automatic digital computer 
(Mk. I). 

REFERENCES 
AspIn, A. A. (1948). An examination and further development of a formula arising in the problem of 


comparing two mean values. Biometrika, 35, 88-96. 

Aspin, A. A. (1949). Tables for use in comparisons whose accuracy involves two variances, separately 
estimated. Biometrika, 36, 290-6. 

BEHRENS, W. V. (1929). Ein Beitrag zur Fehlerberechnung bei wenigen Beobachtungen. Landw. Jb. 
68, 807-37. 











tion 
(No 
elds 


tion 
and 
sed; 
tive 


are 
ited 
ibly 
ates 

are 
and 


9-1) 


stri- 
is in 


9-2) 








G. S. JAMES 317 


Buss, C. I. & Ross, C. L. (1940). The assay of parathyroid extract from the serum calcium of dogs. 
Amer. J. Hyg. A, 31, 79-98. 

Creasy, M. A. (1954). Limits for the ratio of means. J. R. Statist. Soc. B, 16, 186-94. 

Freier, E. C. (1940). The biological standardization of insulin. J. R. Statist. Soc. Suppl. 7, 1-64. 

FIELLER, E. C. (1954). Some problems in interval estimation. J. R. Statist. Soc. B, 16, 175-85. 

FrnnEy, D. J. (1950). Two new uses of the Behrens—Fisher distribution. J. R. Statist. Soc. B, 12, 
293-300. 

Finney, D. J. (1952). Statistical Method in Biological Assay. London: Griffin. 

FisHer, R. A. (1935). The fiducial argument in statistical inference. Ann. Eugen., Lond., 6, 391-8. 

FisHER, R. A. (1939). The comparison of samples with possibly unequal variances. Ann. Eugen., 
Lond., 9, 174-80. 

FisHEeR, R. A. (1941). The asymptotic aporoach to Behrens’s integral, with further tables for the 
d test of significance. Ann. Eugen., Lond., 11, 141-72. 

FisHErR, R. A. & YATES, F. (1953). Statistical Tables for Biological, Agricultural and Medical Research. 
4th. ed. Edinburgh: Oliver and Boyd. 

GRIDGEMAN, N. T. (1944). The Estimation of Vitamin A. London: Lever Brothers and Unilever Ltd. 

James, G. 8. (1954). Tests of linear hypotheses in univariate and multivariate analysis when the ratios 
of the population variances are unknown. Biometrika, 41, 19-43. 

JEFFREYS, H. (1940). Note on the Behrens—Fisher formula. Ann. Eugen., Lond., 10, 48-51. 

Mavutpon, J. G. (1955). Pivotal quantities for Wishart’s and related distributions, and a paradox in 
fiducial theory. J. R. Statist. Soc. B, 17, 79-85. 

Pearson, E. 8. & Hartiey, H. O. (1954). Biometrika Tables for Statisticians, 1. Cambridge: The 
University Press for the Biometrika Trustees. 

SuKHATME, P. V. (1938). On Fisher and Behrens’ test of significance for the difference in means of 
two normal samples. Sankhyd, 4, 39-48. 

Trickett, W. H. & Wetcn, B. L. (1954). On the comparison of two means: further discussion of 
iterative methods for calculating tables. Biometrika, 41, 361-74. 

Trickett, W. H., Wetcu, B. L. & Jamus, G. S. (1956). Further critical values for the two-means 
problem. Biometrika, 43, 203-5. 

We ou, B. L. (1947a). The generalization of ‘Student’s’ problem when several different population 
variances are involved. Biometrika, 34, 28-35. 

WE tc#, B. L. (19476). On the Studentization of several variances. Ann. Math. Statist. 18, 118-22. 

Yates, F. (1939). An apparent inconsistency arising from tests of significance based on fiducial 
distributions of unknown parameters. Proc. Camb. Phil. Soc. 35, 579-91. 








318 


Accuracy of weighted means and ratios 


Table 1. Upper 5 % eritical values of w= (#—) (wy + We) 
(i.e. upper 10 %, critical values of | u|) 








16 


eo 


z= (w12, + W2%,)/(W,+ Ww.) is the weighted mean of two independent normally distributed variables 
v, and x, which have the same expected value #, and variances A,o} and Azo} respectively. Inde- 
pendently distributed estimates sj, 83 of oj, 72 based on 1, v, degrees of freedom are available, and 
the weights used are w, = 1/(A1 83), W, = 1/(Ags3). 

For example, if the 2, and sj (i=1, 2) denote the means and variances of two samples of sizes n; 


wee 
— | Bo) 


8 


—— 
io on? Ine 2) 
amon 


1-81 


tt 
io ole cine ome ome 2) 
= 


-~1I +1 +I] 
or or 


— ee 
-I ~I +1 +I 
bo bo bo te bt or or or or 


ee et 
-1 +1 +! 


1-1 


own 


ee ee ee 
-~1 0 


~1 +1 +10 © ® 


or 


-~I +1 +1 @ 
Li) 


ee 
bo» O11 © 


1 +1 


m Or 00 & 


— = 

Qa OQ-1+1+1 0 
=) 

ft eet 


_ 


1-88 
1-82 
1-78 
1-73 
1-71 
1-64 


0-6 


1-64 


0-7 


1-75 
1-72 
1-64 





taken from two normal populations with the same mean yp, then v; = n,;—1 and A; = 1/n;. 
































G. 8. JAMES 319 
Table 2. Upper 24 % critical values of u=(2—) /(w,+ we) 
(i.e. upper 5 %, critical values of | u|) 
| | | | 
0 | ir=—*_| 00 O1 O2 O83 O04 O08 O86 OF O8 O89 10 | 
| w+) 
—_— an 
94 Ve | Vy | . 
8 | 8 2-31 2:37 2:43 2:45 247 247 247 2-45 2-43 2-37 2-31 
86 | a 2:31 2:35 238 240 241 241 240 2:39 2:35 2:30 2-23 
81 12 | 231 2:34 236 237 237 2:37 2:36 234 231 2-25 218 
75 | 15 | 231 2:32 2:34 234 234 2-33 2-32 2:30 2-26 220 213 
72 | 20 | 231 231 2:32 231 231 230 2-28 2-26 222 216 2-09 | 
64 | | co | 231 229 227 224 222 2:20 217 214 210 204 1-96 | 
| 
94 10 | 8 | 2:23 2:30 2-35 239 240 2-41 241 240 238 235 231 
poh | 10 | 223 227 2:31 2:33 2-35 2-35 2:35 2-33 2-31 2-27 2-23 
ei 12 | 223 2-26 2-29 2-30 231 2-31 2:30 2-29 227 2-23 218 | 
- 15 | 223 2-25 2:27 228 228 227 227 225 222 218 213 
72 | | 20 | 223 224 225 225 2-25 2-23 21 218 214 2-09 | 
64 co | 223 «221 220 218 216 «B14 B12 209 206 202 1-96 | 
94 12 | 8 | 218 225 231 234 236 237 237 237 236 234 231 | 
86 | | 10 218 2:23 2-27 2:29 2-30 2-31 231 2:30 2-29 2-26 2-23 | 
"| 12 218 221 2-24 226 227 2:27 227 2-26 224 221 218 
15 | 15 2:18 2:20 2-22 2:23 224 2:24 2-23 222 220 217 2-13 | 
72 | | 20 2-18 219 220 2-21 221 220 219 218 216 213 2-09 | 
64 | | | 2:18 217 216 214 213 211 209 207 204 200 1-96 | 
| ] 
oe 115 | 8 | 213 220 226 230 232 283 234 234 234 232 231 | 
86 | | | 10 213 218 222 2:25 227 227 2:28 228 2:27 2:25 2-23 | 
81 | 12 | 213 217 220 222 223 224 224 223 2:22 220 218 | 
hin | 15 213 216 218 219 220 220 220 219 218 216 213 | 
72 | 20 213-215 216 WT ITT 16 215 1 DL 209 | 
64 | a) 2:13 212 211 210 209 208 206 2-04 202 1:99 1-96 | 
04 | | 20 | 8 | 209 216 2-22 2:26 228 2:30 231 2-31 2-32 231 231 
86 10 | 209 214 218 221 223 224 225 2:25 2-25 224 2-23 
Si | | 12 2:09 213 216 218 219 220 221 221 220 219 218 
oa | 16 | 2-09 «21L 214 215 216 217 217 BIT 216 «6215 213 
so | 20 | 209 2-10 212 213 213 214 213 213 212 210 2-09 
64 o | 209 208 207 207 2:06 2:05 2-03 202 200 1:98 1-96 
| 
04 | | 00 8 | 196 2:04 210 214 217 220 222 2-24 227 2:29 2-31 
86 | | 10 | 196 202 2:06 2:09 212 214 216 218 2-20 21 2-23 
81 | | 12 1:96 2:00 204 207 2:09 211 213 214 216 217 218 
ea | | 15 1:96 1:99 202 204 206 208 2:09 2:10 211 212 213 
7” a. 1-96 1:98 200 202 203 205 206 207 207 208 2-09 
64 | | | o | 196 1:96 1:96 196 1:96 196 196 196 1:96 1:96 1-96 
| | 
ables a= (w, 2, + W22%y)/(w,+W,) is the weighted mean of two independent normally distributed variables 
nde- } a, and x,, which have the same expected value y, and variances A; a} and A,@% respectively. Inde- 
and dependently distributed estimates st, sz of o%, o2 based on V1, V2 degrees of freedom are available, and 
3 } the weights used are w, = 1, (Ar si), We = 1/(Ags3). 
es N; 


For example, if the x; and s? ({=1, 2) denote the means and variances of two samples of sizes n; 
taken from two normal populations with the same mean yp, then vy; = 7,;— 1 and A; = 1/n,. 








320 


Accuracy of weighted means and ratios 


Table 3. Upper 1 % critical values of u=(@—p),/(wy + 2) 
(i.e. upper 2% critical values of | u|) 























re | 00 O1 O02 O8 O04 O8F 06 O7 O8 Of 10 
W, + W,| 
| 
| 
Va Vy 
10 10 | 276 280 284 287 289 289 289 287 284 280 2-76 
12 | 276 279 281 2-83 283 283 282 280 277 273 268 
15 | 276 277 278 278 278 278 276 274 270 2-65 2-60 
20 | 2-76 276 275 275 274 273 271 268 264 258 253 
30 | 276 275 273 271 270 268 265 262 257 252 2-46 
co | 276 273 269 266 263 259 255 251 246 240 233 
| 
12 10 | 268 273 277 280 282 283 283 283 281 279 2-76 
12 | 268 271 274 276 277 277 277 276 274 271 2-68 
15 2-68 269 271 272 272 272 271 270 267 264 260 
20 2-68 268 268 268 268 267 266 2-64 261 257 253 
30 268 267 266 265 264 262 260 258 254 250 2-46 
x | 268 265 262 260 257 254 251 247 243 238 2-33 
15 10 | 260 265 270 274 276 278 278 278 278 277 276 
12 | 260 264 267 270 271 272 272 272 271 269 268 
15 | 260 2-62 2-64 266 266 267 266 266 264 262 2-60 
20 2:60 261 2-62 262 262 262 261 260 258 255 2-53 
30 2-60 2:60 2:59 259 258 257 2656 254 252 249 2-46 
oo 2-60 258 256 254 251 249 246 244 2-40 2-37 233 
20 10 253 258 264 268 271 273 274 275 275 276 276 
12 253 257 261 264 266 267 268 268 268 268 2-68 
15 253 255 258 260 261 262 262 262 262 261 2-60 
20 253 254 255 256 257 257 257 256 255 254 283 
30 2-53 253 2:53 2-53 253 253 252 251 2-49 248 2-46 
oo 253 251 250 248 246 244 243 240 238 235 2-33 
30 10 2-46 252 257 262 265 268 270 271 273 275 276 
12 2-46 250 254 258 260 262 264 265 266 2-67 2-68 
15 | 246 249 252 254 256 2-57 258 259 259 260 2-60 
20 | 246 248 249 251 2652 253 253 253 253 253 253 
30 | 2-46 246° 2-47 248 248 248 248 248 247 246 2-46 
| co 6— | 2-46 «245 24d 2434240239 BB BG 8423S 
co | 10 | 233 240 2-46 251 255 259 263 266 269 273 2-76 
| 12 | 283 238 243 247 251 254 2657 2-60 262 265 2-68 
| 15 | 233 237 240 244 246 2:49 251 254 256 2:58 2-60 
| 20 | 233 235 288 240 243 244 246 248 250 251 2-53 
| 30 233 234 2:36 238 239 240 241 243 244 245 2-46 
co | 233 233 233 233 233 233 233 233 233 233 2-33 
| 











% = (w,x, + w,%,)/(w,+w,) is the weighted mean of two independent normally distributed variables 
x, and x,, which have the same expected value , and variances A; oi and Ago3 respectively. Inde- 
pendently distributed estimates si, 8} of o7, 73 based on v,, vy, degrees of freedom are available, and 
the weights used are w, = 1/(A, 87), wz = 1/(A283). 

For example, if the x, and s; (i=1, 2) denote the means and variances of two samples of sizes n; 


taken from two normal populations with the same mean y, then vy, = n,—1 and A; = 1/n;,. 

















G. S. JAMES 321 


Table 4. Upper } % critical values of u=(%—) /(wy + W) 
(i.e. wpper 1%, critical values of | u |) 


























| | 
lr=—_"2_| 00 O§1 O02 O83 O4 O8F 06 O7 O8 O08 10 
Ut We | 
| 
| | 
Vy | Vy | 
10 | 10 | 317 320 324 3:27 329 330 3:29 327 324 3:20 317 
| 12 | 317 318 320 3:21 322 3:22 321 318 314 3:09 3-05 
| 15 | 317 316 316 316 316 315 313 310 3:05 3:00 2-95 
| 90 | 317 314 313 311 310 3:08 3:05 3:02 2:96 2:90 2-85 
| 30 | 317 313 310 3:07 305 302 299 294 288 282 275 
| @ 317 311 3:06 3-01 2:96 2-91 286 280 274 266 258 
| 
12 10 3:05 3:09 314 318 321 322 322 321 320 318 317 | 
12 | 305 307 310 312 314 314 314 312 310 3:07 3-05 
15 3-05 306 3:07 307 308 307 306 304 301 298 2-95 
| 20 3-05 304 3:03 303 302 301 2-99 296 293 2-88 2-85 
| 30 3-05 303 301 2:99 2-97 295 2-92 289 285 280 275 
| ow 3:05 301 2-97 2-93 2:89 284 280 275 270 264 258 
15 | 10 | 2:95 300 3:05 310 313 315 316 316 316 316 317 
12 | 295 2-98 301 304 306 307 308 307 307 3:06 3:05 
| 45 2:95 296 2:98 2:99 300 300 300 299 298 2-96 2-95 | 
| 20 2-95 2:94 295 2:95 295 294 2-93 291 289 287 285 | 
| 30 | 295 293 292 291 290 288 286 284 281 278 275 
| o | 295 291 288 285 281 278 275 271 267 262 258 | 
| | 
20 | 10 2:85 290 2:96 302 305 308 310 311 313 314 317 | 
| 12 | 285 288 293 296 299 3:01 3:02 3:03 303 304 3-05 | 
| 15 2:85 287 289 291 293 204 295 295 295 204 2-95 | 
| 20 | 285 285 286 287 288 288 288 287 286 285 2-85 | 
| 30 | 2:85 284 284 283 283 282 281 280 278 277 275 | 
| 2-85 282 280 277 275 272 270 267 264 261 2:58 | 
30 | 10 275 282 288 2:94 299 302 3:05 3:07 310 313 317 
| 12 | 275 280 2:85 289 2-92 295 297 299 301 303 3-05 
| 15 275 278 281 284 286 288 290 291 292 293 2-95 
/ 20 275 277 278 280 281 282 283 283 284 284 285 
| 30 275 275 276 276 276 277 276 276 276 275 275 
| © 275 273 272 270 269 267 265 263 262 260 258 
| 
o | 10 | 258 266 274 280 286 291 296 301 306 311 317 
| 39 «| 258 264 270 275 280 284 289 293 297 301 3-05 | 
| 15 | 258 262 267 271 275 278 281 285 288 291 2-965 | 
20 2:58 261 264 267 270 272 275 277 280 282 285 
30 2-58 260 262 263 265 267 269 270 272 273 275 
co 2-58 258 258 258 258 258 258 258 258 258 2-58 


| | | 





= = (W,%, + W2X_)/(W,+W2) is the weighted mean of two independent | normally distributed variables 
x, and x, which have the same expected value #, and variances A,0? and Azo} respectively. Inde- 
dependently distributed estimates 8i, 83 of O74, o% based on 1, ¥, degrees of freedom are available, and 
the weights used are w, = 1/(A,81), W2 = 1/(Aq83)- 

For example, if the «; and 3 (i=1, 2) denote the means and variances of two samples of sizes n; 
taken from two normal populations with the same mean , then v; = n;—1 and A; = 1/n,. 


21 Biom. 43 








[ 322 ] 


ON ESTIMATING THE LATENT AND INFECTIOUS 
PERIODS OF MEASLES 


II. FAMILIES WITH THREE OR MORE SUSCEPTIBLES 


By NORMAN T. J. BAILEY 
Design and Analysis of Scientific Experiment, 6 Keble Road, Oxford 


1. INTRODUCTION 


In a previous paper (Bailey (1956), referred to here as Part I) I discussed, for families with 
only two susceptibles, the maximum-likelihood estimation of parameters in an epidemic 
model involving a normally distributed latent period after the receipt of infection, followed 
by a constant infectious period terminating with the appearance of symptoms and removal 
of the patient from circulation. The present paper gives the extensions required for dealing 
with families having more than two susceptibles. The case of three susceptibles, which is 
quite common, is described in some detail as it presents a few new features. Larger families 
can be analysed in a similar way, but the procedure becomes rapidly more complicated, 
especially as the misclassification of links in the chain develops into an item of major im- 
portance. Since there is at present little data on such families we shall give an indication 
only of how to take account of all the complications involved. 

An illustrative example is provided here for families with three susceptibles using some 
excellent material that has very kindly been made available to me by Dr R. E. Hope 
Simpson. The data are based on families residing in the Cirencester area during 1946-52 
with three susceptible children under 15 years of age including at least one case of measles. 


2. DESCRIPTION OF DATA FOR FAMILIES WITH THREE SUSCEPTIBLES 


As shown in Part I, for families with only two susceptibles, the data, consisting of say a 
total of NV families with at least one case, fall naturally into three parts. There are A families 
with two cases, both having been infected simultaneously by an outside contact. There are 
a further B families with two cases, where the second case is derived from the first by cross- 
infection within the family. We also know the time interval between the two cases, w for 
the first type and z for the second. As a first approximation we assume that these two types, 
which are labelled (2) and (1?) in chain-binomial notation (Bailey, 1955), can be accurately 
distinguished. The third type of family, of which there are C, so that N = A+B+C, 
contains only a single case and is labelled (1). 

When dealing with families having three susceptibles the parts of the data involving one 
or two cases only can again be described as above, and we shall use the same notation. But 
we also have in addition D families containing three cases, so that now NV = A+ B+C+D. 
These families are of four kinds represented by the chain-binomial symbols (1%), (12), (3) 
and (21), with actual numbers #, F, G and H, respectively, where D = H+ F+G+4H. 
This time there are two time intervals to be recorded: uw between the first and second cases, 
and v between the second and third cases. Assuming for the time being that the different 
types of chain can be correctly identified, we must consider the type of information they 
provide in some detail. 





i 


mam OCU Or ehlCUCUrrl!lUC<CrK<l(<é‘«‘éirkK hh) Oa 


oe ee ae ee ae 





; with 
lemic 
lowed 
moval 
paling 
‘ich is 
milies 
sated, 
or im- 
‘ation 


some 
Hope 
46-52 
asles. 


say a 
milies 
re are 
cross- 
w for 
types, 
rately 
B+C, 


ig one 
1. But 
O+D. 
2), (3) 
+H. 
cases, 
terent 
1 they 





Norman T. J. Barry 323 


First, take the Z families of type (1°), where the second case is derived from the first, and 
the third from the second, both by cross-infection within the family. The two intervals, 
u and v, are clearly each a ‘z-type’ variable such as arises from the B families giving a (1°) 
pattern. Secondly, the F families of type (12), where the second and third cases have been 
simultaneously infected by the first, also provides two z-type observations, but these are 
now given by wu and u+v. We therefore have in all J = B+2(H+F) z-type observations 
which can be subjected to the analysis already presented in Part I. 

Next, we consider the G families of type (3), where all three cases have been simultaneously 
infected by an outside contact. These must be examined separately as they cannot be put 
in terms of other types of data already discussed. The appropriate analysis is given in the 
next section. 

The H families of type (21), where the first two cases are derived from a simultaneous 
infection and the third by cross-infection, present a special difficulty. The first interval 
is plainly a ‘w-type’ variable and can be taken in conjunction with the data for the A 
families of type (2), giving K = A+H families altogether. The second interval v has a very 
much more complicated distribution, since the third case could have been infected during 
either of the infectious periods of the first two cases. It seems better to ignore the small 
amount of information available from this source in order to avoid excessive complexity. 


Table 1. Distribution of time interval for fifteen families of three susceptibles with two cases 





] l 
| Time intervalindays | 0 1 2 3 4 |i £2 9.8. Bw a 
No. of families co oe, a 1-2 sie ke s 2 & 
| 
es a 
Probable type of chain (2) | (1?) 
| Total no. | A=4 | B=11 
l — 








Finally, we can analyse the data according to the numbers of families showing different 
types of chain, apart from the time intervals involved. The main subdivision here depends 
on whether the disease is introduced by a single case or by two simultaneously infected 
cases. A triple introduction adds nothing further to what has already been mentioned in 
the last paragraph but one. When there are two initia! cases, ie. A of type (2) or H of type 
(21), the treatment is very similar to that given in Part I for the distribution of B, given 
B+C. When there is a single introduction we have types (1), (1?), (1%) and (12), with observed 
numbers C, B, H and F, where we write M = B+C+E+F. 

A new point arises here in connexion with the possibility of variations in the chance of 
infection. As shown in Part I the z-type distribution is relatively insensitive to such varia- 
tions, while the w-type variables are quite unaffected. The mean frequencies of A and H 
will also be unchanged. Variation in the chance of infection is therefore ignored so far as 
those sections of the data are concerned. However, it has been shown (Bailey, 1953) that 
the chain-binomial analysis of families of three susceptibles with a single initial case is 
considerably influenced by variations in the chance of infection. An additional parameter 
must therefore be introduced when this part of the data is analysed. 

The last matter to be mentioned in this section is the question of correctly identifying the 
links of the chains involved. The distinction between (1?) and (2) is accomplished in a similar 
manner to that described in Part I. Table 1 shows the distribution of fifteen observations 


21-2 








324 Estimating the latent and infectious periods of measles. II 


of this sort with an approximate dichotomy being made, as before, between 4 and 5 days. 
It is a little more difficult to separate the four groups of families with three cases and we now 
need to inspect a two-way table. The distribution of fifty-seven observations is shown in 


Table 2. Distribution of time intervals (u,v) for fifty-seven families 
of three susceptibles with three cases 
































Time (u) between first and second cases in days 
bo stincimbists : aeeee ern = —| Total 
2 2 ae 56 6 7 8 910 11 12 13 14 15 16 
| 
she GEG See ea: Metaiae = Mi 
i | 
g 0 | sa m © 2 -3..% 4 14 
5 Bee 1 a ee 1 12 
2 2/1 1 1 ey. 2 12 
a 3 1 ieee 1 4 
3s | 4 2 2 
es. 
S ee | ——— 
Z | | 
= 5 | | 1 2 | 3 
. 6 | je | 1 
So | 7 | . A 
oO 
Se ee vi 1 
¢ 
5 | 9 | 1 ‘ 1 2 
= 10 . & . 1 
- | 
ss {|n Bs | 1 
2} } #8 | 2 
= | 13 | | i 1 
o } 
se | 14 l | 1 
gC | | 
a ee ee coe 
Total | 6 4 3 1 | 3.3.3.9 610 22311 | o7=D 











Table 2a. Summary of Table 2 giving probable types of chain with observed numbers, 
based on a dividing line between 4 and 5 days 








(3) (12) | 
e=T F=37 
(21) | (13) 

a=7 | E=6 | 





Tables 2 and 2a. Inspection of the marginal totals suggests that an appropriate division can 
again be made between 4 and 5 days, but there is now a somewhat greater possibility of 
overlap than appeared in Part I. As mentioned in § 5 below the chance of misclassification 
can be allowed for by the introduction of additional parameters, but seems hardly worth 





= 


3 ft h6hoSDlUlCOO] 





Norman T. J. BArLEY 325 


days. embarking on in the present investigation. Tables 3 and 4 give the amalgamated expected 

> NOW and observed numbers for those parts of the data giving rise to w- and z-type distributions, 

wn in while analyses of the chains for families with two and three cases are presented in 
Table 5. 


Table 3. Amalgamated frequencies for w-distribution, with a 
total of K = A+ H observations 





Time interval | 















































ce Observed no. | Expected no. 
in days | 
<i 0 3 - 1-87 
1 a} ‘ 3-42} = 
2 1 2-61 
3 2 1-64 
4 < o-ssf 7! 
>5 0-58 
i Total ll=K 11-00 : 
Table 4. Amalgamated frequencies for z-distribution, with a 
; total of J = B+2(H + F) observations 
| | 
} bisa ae Observed no. | Expected no. 
in days 
0 : 
: 1 ; 
2 0-01 
3 rs 8 0-047 4:32 
g 4 : 0-24 
: 5 3 1-01 
6 5 3-02 
7 5 6-64 
8 6 10-97 
9 17 14-16 
10 7 15-00, 
il a sh mth a 
\ 12 10 11-55 
13 10 8-87 
14 5 6-03 
15 1 3°43 
16 3 1-54 = 
17 ae os3f 7% | 
mean | 218 0-16 | 
ity of } —_— 
ation Total 97=J 97-00 | 
worth 














SE 











326 Estimating the latent and infectious periods of measles. II 


Table 5. Analysis of chains for two or three cases 














ad adees | Type of Observed no. | Expected no. 
introduction chain | 
| | 
| — — —— | 
Single | (1) 6 | 8-62 
| (1?) 11 6-23 
| (1°) 6 | 9-81 
| (12) 37 35:34 
| Total 60= M | 60-00 
SD 2 VE Side A ‘i | 
Double (2) 4 | 3°05 
(21) 7 7-95 
| Total | 1l=K | 11-00 
| | 











3. DERIVATION OF SCORES AND INFORMATION FUNCTIONS 


For a detailed discussion of the mathematical model being used reference should be made to 
Part I. It is sufficient to repeat here that we assume the latent period, x, to be normally 
distributed with mean m and variance o, and the ensuing infectious period to be of constant 
length a. Infection of further susceptibles during this time is taken to be a Poisson process 
such that the chance of a given susceptible contracting the disease in time dt is Adt. The five 
main aspects of the data that are faily easily analysed will now be described in detail. 
The w- and z-type distributions arising from the data are easily dealt with according 
to the theory already described in Part I. The only difference being that there all contri- 
butions to scores and information functions were amalgamated (see (9) and (11) of Part I), 
whereas here it is more convenient to have the individual contributions displayed separately. 


w-type distribution 


There are K families giving w-type observations, and the appropriate score and informa- 
tion functions are 
S, = Ko (4Vo-*— 1), (1) 


I, = 2Ko~, (2) 
where, as before, V is the observed second moment of the distribution about the origin. 
z-type distribution 
Here we have J families in all yielding z-type observations. The four scores are 


S, = J{m—Z+A1+Ao®?—-a(e'*—-1)-4}. + OT7;, ) 
Zz 


S, = —IXeM—1)74 D7, 
z 
3 
Sn = JX ae > =. a ( ) 





S, = Jie +> T7,, 








in 








Norman T. J. Battty 327 


where 2 and v are the mean and variance of the observed distribution and 7; is defined as 
in Part I and summed over the observed values of z. The corresponding information func- 
tions are 


a= py T? = —J{d2ot+02—A-* + aera(e04 — 1)-}, . 
Da = ~ TT, +J{(1 + A2o?) (e¢— 1)? — Aae*4(e*4 — 1), 

Fam = BT, Ty—J(1 + 20%), 

Qe = = T. 2), —J2303, 

Toa = & T2 —Jd%(e4—1)-2, 

Tam = Xi Ta Tm + IAM = 1), ' (4) 
Lug = = 7,1, + JA8o(e*4 — 1)-, 

Jam = & T™, —Ja’, 


Ing = XT, T, — Jado, 
z 





L,=ST? —JAte. 
z 


Chains with double introductions 


We now take the K families with a double introduction, i.e. A of type (2) and H of type 
(21). Using the Greenwood rather that the Reed—Frost formulation (see Bailey, 1955, §§ 6 
and 7), the relative frequencies are e~** and 1—e~**. Contributions to the scores and 
information functions are therefore 


S, = Ha(e**—1)-1—Aa, 8S, = Aa 8,, (5) 


Da = Ka?(er4 — 1)-*, Ta => AaI,, | = A?a-*I). (6) 
There is no special difficulty in developing corresponding formulae for the Reed—Frost 
variant if required, though variation in the chance of infection must then be allowed for. 


Chains with single introductions 


As mentioned above, it is necessary to take into account the possibility of variations in 
the chance of infection when analysing the frequencies of the several chains starting with 
only a single introductory case. This has already been discussed at length in an earlier paper 
(Bailey, 1953). The chief assumption made was that the chance of cross-infection, p, which 
when constant equals 1—e~’¢ in our present notation, followed a f-distribution with 
parameters x and y, namely, 


1 
dF = ———_p*-\(1—p)Y"1dp (0<p<}l). (7) 
Bey)” (l-p)¥4dp (0<p<l) : 








328 Estimating the latent and infectious periods of measles. II 


The scores and observed information functions, repeated here for convenience using the 
present symbols for observed quantities, were shown to be 


’ ea alee M M B+E \, 
Si =)\- —j———— 
x +1 x+y x+y+1 + e+yt2 
ga [a aoe M M B+E | 
= ii 














(8) 











y y+1 x+y x+y+1 ta ty tl’ 

, B+E+F E+F ) 
an” es 2 bed 
I, = _-W, 

, B+C+E Bil > (9) 
tw Fai 

where ee. Sees = + ted 
(ety (wtytl?P (at+y+2)) 





Primes are used here to distinguish these auxiliary scores and information functions from 
those with which we are more immediately concerned. 

This procedure entails using two parameters to specify the probability of infection. The 
parameter A, which we are already using to score data that are relatively insensitive to 
variations in the chance of infection, can be regarded as an average probability. We therefore 
need to introduce one additional parameter. The average value of p for the distribution (7) 
is p = x/(x+y), so that we can write 

iil =]-— e—Aa. 


or x= y(e*4*—1). (10) 


We can now suppose the expected frequencies to have been written in terms of A, a and y 
instead of x and y. The usual processes of differentiation, using the functional relation (10), 
then lead to the following expressions for the scores and information functions as required 
in the present context, written in terms of those given in (8) and (9): 


S,=a(a+y)Si, S,=A(zty)S, S,= () +85, (11) 


Ia = @(x+y){(a+y)I,,—Si}}, 

Da — Aa, tay (x +y) Si 

Ty = Uxty)y (al, + yl, —8;); 
| = A?a-*I,), > (12) 


Tay = Aa Thy 
x “ , x , , 
b= 6) ff) 


Time-interval distribution for triple introductions 





The last item to be discussed is the extraction of information from the G families in- 
volving a triple introduction. Each basic observation is the number-pair (u,v), where u<v. 
Let the times at which symptoms occur in the three patients, measuring from the common 








ig the 


(8) 


: from 


1. The 
ive to 
refore 
on (7) 


(10) 


and y 
1 (10), 
juired 


(11) 


(12) 


ies in- 
ULYV.z. 
mmon 





Norman T. J. Battey 329 


point of infection as origin, be £,, and £3. Then these variables are normally and in- 
dependently distributed with mean m + a and variance o?. The joint-frequency distribution 
of the ordered trio (£,, &, &3) is 





3! 1 3 
S(E1 < $2 < £3) dE, dé,dé, = @no)l exp) — 292 =, (6 —m—a)*) dé, dé,dé3. (13) 
If we now use the transformation 
“= £. a Ey 
v = £,—&5, (14) 


s=§,+8+83, 


to write down the joint distribution of uw, v and s, we can then integrate out s to give the 
required joint distribution of w and v only. This turns out to be 


/s 
f(u,v)dudv = ATeXP|— gat + uo +04] dudv. (15) 


We can now derive a score and information function for o in the usual way. If for a set of 
observed number-pairs (w;,v;) (¢ = 1,...,@), we put 


G 
“=X (wit+ uv; + v7), (16) 
i=1 


{= 
we obtain S, = —2Go1+3Zo0, I,, = 4Go-. (17) 


These results are, moreover, not affected by variations in the chance of infection. 

This now completes the derivation of all the various components of the scores and 
information functions required for obtaining maximum-likelihood estimates of the five 
parameters A, a, m, o and y. The only elements of the 5 x 5 information matrix which have 
not been mentioned explicitly are [,,, and J,,, and these are easily seen to be identically 
zero. In analysing actual data we first classify them according to the five main aspects 
above. Next, we calculate in each case, for trial values of the parameters, the appropriate 
contributions to scores and information functions. Corresponding contributions are then 
added and the resultant vector of overall scores multiplied by the complete information 
matrix to give first corrections to the trial values. The process is then repeated as usual 
until the desired accuracy is achieved. 


4. ILLUSTRATIVE EXAMPLE 


The foregoing maximum-likelihood scores and information functions can now be applied 
to Hope Simpson’s data, the relevant aspects of which are exhibited in Tables 1-5. A pre- 
liminary inspection of the data shows that so far as the z-distribution in Table 4 goes a really 
satisfactory fit is unlikely to be obtained on account of the apparent excess of observations 
at 11 days with a counterbalancing deficit at 10 days. In Part I a difficulty of this sort was 
encountered with families of two in the appearance of possible spurious peaks at 7 and 14 
days. This was thought to be due to an unconscious bias associated with integral multiples 
of a week. In the present case no very satisfactory explanation can be discerned. However, 
as the agreement between hypothesis and observation seems in other respects to be quite 
good, the best plan seems to be to pool the frequencies for 10 and 11 days when proceeding 
to test goodness-of-fit. 











330 Estimating the latent and infectious periods of measles. II 


Preliminary estimates for A, a, m and o were taken from the final values previously 
obtained for families of two, while a trial value of y was available from an earlier analysis 
of data from Providence, Rhode Island (Bailey, 1953). After carrying out the standard 
procedure of maximum-likelihood scoring, the finai estimates turned out to be 


A 


A = 0-180 + 0-039, 

@ = 7-05 + 1-13 days, 
m = 7-63 + 0-50 day, } (18) 
& = 1-59 + 0-26 day, 
9 = 0-56 + 0-32. 





These estimates of A, a, m and o are rather less precise than those given in equation (21) 
of Part I for families of two, but it may be noted that in no case are the two estimates of 
any parameter significantly different. The parameter y is not determined with much preci- 
sion, although the estimate obtained does suggest an appreciable amount of variation in 
the chance of infection from family to family. 

In testing goodness-of-fit, based as in Part I on the last set of estimates but one in the 
iteration, the classes bracketed together have been pooled to avoid small expectations. 
We have aiso amalgamated the frequencies for 10 and 11 days in the z-distribution as 
indicated above. The total number of degrees of freedom is 13, i.e. 1 from the w-distribution 
in Table 3. 8 from the z-distribution in Table 4 and 4 in all from Table 5, while 5 must be 
removed to allow for the parameters estimated. We actually obtain an overall x? of 14-6 
on 8p.F. As the 5 % point is at 15-5 we can regard the fit as reasonably satisfactory, except 
of course for the anomalous behaviour of the frequencies in the 10- and 11-day classes of the 
z-distribution. Whether this is due to some bias in collecting the data, or whether it is of 
genuine biological significance is a matter which requires further investigation, and should 
be given special attention when new data of this type are collected. 

Some consideration should be given at this point to the consequences of neglecting the 
effect of variations in A on the form of the z-distribution. Using the method of § 5 in Part I, 
it can be shown that the fairly substantial variation envisaged there would in the present 
case actually improve the fit very slightly, reducing y* by about 0-5. 


5. EXTENSION TO LARGER FAMILIES 


The general procedure described here for families of three can clearly be extended to larger 
families. By picking out contributions to w- and z-type distributions we could use the scores 
and information functions given in (1)—(4) directly. When analysing the different kinds of 
chain allowing for a variable chance of infection the formulae given above in (10)-(12) 
would have to be applied to the extensions of (8) and (9) indicated in detail in Bailey (1953). 
Double and triple introductions can also be dealt with as above. Other multiple introductions 
require the obvious extensions of (13)—(17). We have in the present paper neglected to make 
use of the distribution of v-intervals in chains of type (21), because of undue complexity. 
With larger families there would be further relatively intractable items of this sort. Another 
point of importance is that with small families the errors introduced by neglecting to make 
allowance for the probability of chains being misclassified are likeity to be small. But in 
larger families this source of error would be much more pronounced because of the greater 
opportunity for the distributions of different kinds of chains to overlap. If sufficient data 








isly 
ysis 
ard 


18) 


‘ 


(21) 
s of 
eci- 
n in 


the 
ons. 
L as 
tion 
t be 
14-6 
sept 
‘the 
is of 
ould 


the 
rt I, 
sent 


rger 
ores 
1s of 
(12) 
953). 
jlons 
nake 
xity. 
ther 
nake 
at in 
sater 
data 





Norman T. J. Battey 331 


of this kind are forthcoming the difficulty could be tackled along the lines suggested in § 6 
of Part I. 


It is again a pleasure to acknowledge my indebtedness to Dr R. E. Hope Simpson, of the 
Cirencester Public Health Laboratory Service, for making available to me the epidemio- 
logical records used for the illustrative example in §4, and to express my thanks to 
Mrs Tamara Hazlewood for undertaking the computations required in obtaining the 
numerical results. 


REFERENCES 


BarLEy, N. T. J. (1953). The use of chain-binomials with a variable chance of infection for the analysis 
of intra-household epidemics. Biometrika, 40, 279. 

BattEy, N. T. J. (1955). Some problems in the statistical analysis of epidemic data. J. R. Statist. 
Soc. B. 17, 35. 

Battey, N. T. J. (1956). On estimating the latent and infectious periods of measles. I. Families with 
two susceptibles only. Biometrika, 43, 1. 











[ 332 ] 


SIGNIFICANCE TESTS FOR A VARIABLE CHANCE OF INFECTION 
IN CHAIN-BINOMIAL THEORY 


By NORMAN T. J. BAILEY 
Design and Analysis of Scientific Experiment, 6 Keble Road, Oxford 


In their analysis of measles data for Providence, Rhode Island; Wilson, Bennett, Allen & 
Worcester (1939) found that, although Greenwood’s chain-binomial model (Greenwood, 
1931) fitted satisfactorily the distribution of the total number of cases in families of given 
size, this theory was inadequate when the data were analysed according to the different 
types of chains involved. Greenwood (1949) suggested that this might be due to variations 
in the chance of infection, p, between different households. Subsequently it was shown by 
Bailey (1953) that good agreement between theory and observation could be obtained on 
the assumption that » varied according to the f-distribution 


l 
- - —1 a —1 
a (l—p)’tdp (O0<p<\}). (1) 


The appropriate scores and information functions were given for estimating the two 
parameters, x and y, in households of three and four susceptibles. 

Now if in fact p is constant, x and y are both infinite and the maximum-likelihood method 
breaks down. Dr Armitage pointed this out in the discussion on my Royal Statistical Society 
paper (Bailey, 1955), and suggested that one might find it preferable to work with the 
reciprocals x’ = 2! and y’ = y~!. This is certainly more satisfactory, but the maximum- 
likelihood scoring technique again fails in the limit as x’ and y’ tend to zero, since the scores 
contain terms proportional to x’—! and y’—!. We cannot derive an adequate significance test 
for variation in the chance of infection without making some assumption about the limiting 
form of the ratios y/x and y’/x’. It follows from (1) that the average chance of infection, 


Dp, is given by is 

, p= == -(1+4) f (2) 
x+y 

One way out of the difficulty is therefore to use p as one parameter, and 2’ or y’ as the other. 

The only drawback here is the algebraic complexity of the scores and information functions. 

An alternative and much simpler formulation is obtained by writing 


(3) 


where, for convenience, we have dropped the bar from p, and p = 1—q. 

Table 1 gives the expected frequencies of different types of chains in families of three 
for the Greenwood model, the modified model involving x and y, and the alternative form 
of the latter using p and z. Observations from the Providence measles data are also included. 


Scores for p and z are 


b+c+d a+b+c c+d a+b) 
S,= _ +-— -——., 
Pp qd prez qtz 





: (4) 
‘ n 2(b+c) c+d a+b 

S,=->—- ene of ee, 

" 1+z 14+22 p+z q+z 




















Norman T. J. BAtLEy 333 
while the observed information functions are 


_b+c+d at+b+e, c+d , atb 








1 . ‘ 
wf Pp" 7 (pt+z)? (q+z)? 
ja c+d " a+b t (5) 


(p+2)? (q+2)’ 
1 2 ei’) c+d a+b 
S (L+2)? (1+22)?  (pt+2)? (q+2)?" 
The expressions appearing in (4) and (5) are particularly suited to rapid computation, since 
for trial values of p and z the various quotients involved in. each line can be accumulated 
directly on a calculating machine, the squares of denominators in the case of information 
functions being read directly from Barlow’s tables. 








Table 1. EHapected and observed numbers of chains in families of three susceptibles 






































| Expected | Expected nos. on modified model ; 
Type of | nos. on | Ob- | Providence 
chain | Greenwood served | measles 
| model | In terms of x and y | In terms of p and z | a | anes 
| | | | 
ny(yt+1 nq(q+z 
1 ng y+) | nae t*) | a 34 
(v+y) (e+y+1) (1+2) | 
| | 
| 2Qnay(y+1 2n +z 
1? 2npq? | uiy+") bog anpaa +2) b 25 
| (@t+y) (@+yt]) (wt+y+2) | (L+2)(14+2z) | | 
2na(x+1 2n +z | 
1= =| = 2np*gq a \ by re. :, c 36 
| (v+y) (wt+ty4+l1)(w@+y+2) | (142) (1+2z) 
na(x +1) np(p+z) 
< | Feb 1 2 
oe lil (vty) @w+yt)) (1+2) rte ee 
eee See Wee tte rice h 6-21 Coe 
| | | | 
Total | mn | n | n | 334 





Suppose, however, we are merely concerned to test for the existence of variability in the 
chance of infection, i.e. to examine the hypothesis that z=0. Then we can avoid the full 
scoring procedure involved in estimating p and z jointly by basing the significance test on 
the distribution of the score S,, calculated at the values z=0 and p= , where fy is the 
maximum-likelihood estimate of p when z= 0. Thus 


ai b+ 2c+ 2d 





Le Cae 6 

Po = 9a + 3b + 3c + 2d’ (6) 

and S.(Po, 0) = —_ -- — —(a+3b+ 3c+d). (7) 
Po —Po 


The large sample variance of S,(po, 0) is easily shown to be [J,, — }./I,,,|--0- Using expected 
information functions we therefore have 
n(3—7 

(3—7pq)_ (8) 











334 Significance tests for a variable chance of infection 


The data shown in Table 1 give ) = 0-789 and S, = 172 + 23. This result is strongly signi- 
ficant, as expected from the earlier investigation (Bailey, 1953). 

Similar results are easily obtained for larger families. Table 2 gives the expectations and 
observations appropriate to families of four. The frequencies expected in terms of x and y 
have been omitted for reasons of space, but have been set out before in Table 4 of my 1953 
paper. This time the scores for p and z are 















































g _n—-a n—-h n—a—b e+ft+gth n—-g—-h atb+cte bt+e 
> q ptz pt+2z q+z q+2z g+32’ 
_  m WM  3(n—a-h) 4b+c+e+f) 5(c+e) I (9) 
142 1422 (1432 1+42 1+5z 
429-5, Metft+gth) n—-g—h  Batbt+c+e)  3b+e) 
prtz pt 2z q+z q+2z qt 32° 
with information functions 
I n— n—a n—-h  n-a— b e+f+gth n—-g—-h at+b+c+e, bre | 
pp” @ * (pt22 (p+22)® ~ (q+z2) (qt 22)? © Gt+32)’ 
n—a—b Aetf+g+h) n-g—-h Aa+b+ct+e) 3(b+c) 
me (p+2)? © (p+22)® — (q+2)® —— (q+22)®  (q+32)” (10) 
ee Th 4n  _An—a—h) 16(b+c+e+f) 25(c+e) 
2 (L+z)2 (14+2z)? (1432)? (1+ 42)? (1+ 52)? 
n—a—b Afe+f+gth) n—-g- aN tent 9(b +c) 
(p+z) (p+ 22)? (q+2)? (q+ 22)? (q+ 32)?" 


Again we see that the scores and information functions are in a form suitable for easy 
computation. 


The short significance test for variation in infectiousness is now given by calculating 


A ct+d)+3(e+ft+gth)  3(a+e)+6(b+c)+(d+f) 
Spo, 0) = ( eli ) -f g _ ( )4 + 6( f) 


“2 





Do ~1-Do 
— (3a + 10b + 18c + 6d + 15e + 10f + 6g + 3h), (11) 
a b+2(c+d)+3(e+ft+g+h) 2 
where Po 304 5b + 6c + 4d + 6e+ 5f +4g+3h° tie 


This time the expression for the variance of the score does not appear to reduce to a con- 
venient simple formula, but we can still avoid the full iterative procedure by using, as before, 


var S,(Bo, 0) = [I,, —12,,/Ipple-o (13) 
but where, for z= 0, we now substitute the observed information functions oe by 


cw b+2 a(c+d) +: eafterh, 3(a+e)+4(b+ce)+2(d+f)+ 

< Bs (1—Pp)? 

I _ (c+d)+3( (e+ ft+g+h)_ 3(a +e) + 6(b +c) + +(d+f) 

<i Be (1—Ppo)? > (14) 
I _ (+d) +5( (e+f+g+h) 5(a +e) + 14(b+c)+(d+f) 

a +o _ : 

c BB (1—Do)? 





—{5(a+h) + 30(b +f) + 55(c + e) + 14(d +g)}- 








——_ oat aa 1 oe 2 af fo Oete le 





ni- 


nd 


ly 
53 





Norman T. J. Bartey 335 


If the Reed—Frost or Lidwell variant of the chain-binomial model (see Bailey, 1953, 1955) 
were though necessary, certain obvious modifications of the above would be required. 
Applications of these formulae to the data in Table 2 gives }) = 0-791 and S, = 157+ 41, 
again a highly significant result. 
The modification discussed above can also be applied to a recent development of chain- 
binomial theory (Bailey, 1956) in which aecount is taken of variations in the incubation 
period and infectiousness is not confined to an instant but is allowed to persist for a fixed 


Table 2. Eapected and observed numbers in families of four susceptibles 





Expected 








| 
| | Providence 
Typ : ~ad er | Expected nos. on modified model | Observed | measles 
chain Greenwood nos. 
data 
model | 
| 
| 
1 ng nq(q+2) (q+ 22) a 4 


~ (142) (1+ 2z) 


Dy) = | 
LB 3npq! 3npq(q +2) (q+ 22) (¢+3z) ) ; 












































(1 +z) (1+ 2z) (14382) (1+42) 
a 6np*g! _ Snpq(p +2) (¢+2) (q+ 22) (q+ 32) 1 
| (1+z) (1+ 2z) (1+ 32) (1+ 42) (1452) ' 
| | Snpq(p +2) (q+2) | 
3 272 | Js. ape SD SS 
12 =e | (1 +2) (1+ 2z) (1+ 32) Eat ' 
¥ énptg? 6npa(p+z)(p+22)(q+2)(g+22) | ‘ 
(1 +2) (1+ 2z) (1+ 3z) (1+ 42) (1+ 52) 
| 3npq(p+z) (p+2z) (¢+2) 
—T (1 +2) (1422) (1+ 32) (1 +42) Far 3 
3npq(p +2) (p+ 2z) | 

121 Snp*g | (1 +2) (1+ 2z) (1432) g ” 

| ™ np(p +2) (p+ 22) 
13 | - | (1+2) (1+ 22) ° | ” 

| 5 
Total | n | n | n 100 





interval of time. This model uses four basic parameters. In families of two the data are 
insensitive to variations in infectiousness between families, but with families of three, 
however, this must be allowed for in analysing that part of the data which yields chains of 
the types shown in Table 1. One udditional parameter was therefore introduced—the 
quantity called y here. Difficulties again arise if we wish to test whether y’ =y~ is signi- 
ficantly different from zero. The solution is to use z for the additional parameter, as above. 
We therefore have auxiliary scores and information functions S),, S2; Ij,,, I, and I te given 
by (4) and (5) above with primes added, corresponding tu (8) and (9) of the 1956 paper. The 
relation p = 1—e-** is then used to derive the contributions to scores and information 
functions for the parameters A, a and z. We thus have 


S, =aqS,, 8,=AgS,, 8,=S;, (15) 


* 











336 Significance tests for a variable chance of infection 


and Da = aq(qlin + S),)s Ta _ Aah — 9S; ’ 
h. = aql,, Tug = N0-*Nyy, (16) 
[,, = Aa*t,,, L,=I,, 


corresponding to (11) and (12) of the 1956 paper. We could find maximum-likelihood 
estimates of the four parameters A, a, m and o, given z= 0, and then test the significance of 
z by comparing S, with its standard error. However, it is probably easier in practice first 
to test z on the chained part of the data only, using expressions (6), (7) and (8) above. 
We then proceed to estimate all five parameters, using as a trial value of z, either zero if 
the preliminary test is not significant, or the actual estimate from this part of the data, 
based on (4) and (5), if it is. 


REFERENCES 


Battey, N. T. J. (1953). The use of chain-binomials with a variable chance of infection for the 
analysis of intra-household epidemics. Biometrika, 40, 279. 

BatLeEy, N. T. J. (1955). Some problems in the statistical analysis of epidemic data. J. R. Statist. 
Soc. B, 17, 35. 

BatLey, N. T. J. (1956). On estimating the latent and infectious periods of measles. II. Families with 
three or more susceptibles. Biometrika, 43, 322. 

GREENWOOD, M. (1931). On the statistical measure of infectiousness. J. Hyg., Camb., 31, 336. 

GREENWOOD, M. (1949). The infectiousness of measles. Biometrika, 36, 1. 

Witson, E. B., BennEtT, C., ALLEN, M. & WorcrstErR, J. (1939). Measles and scarlet fever in 
Providence, R.I., 1929-34 with respect to age and size of family. Proc. Amer. Phil. Soc., 80, 357. 











[ 337 ] 


ON THE VARIATION OF YIELD VARIANCE WITH PLOT SIZE 


By P. WHITTLE 


Applied Mathematics Laboratory, New Zealand Department of Scientific 
and Industrial Research, Wellington 


The problem examined is that of evaluating the spatial covariance function of yield density, from a 
knowledge of the way yield variance varies with plot size and shape. Results are obtained in $3 for 
several kinds of plot. Results are also obtained (§4) on the dependence of the yield variance on plot 
geometry for very small and very large plots. Special attention is paid to the case for which the 
covariance follows a power law at large distances. 


1. IyrRopUCTION 


It is well known that, in order to explain the observed variation of yield variance with size 
and shape of plot, it is necessary to allow the possibility of correlation between yield den- 
sities at any two points in the plot. (We shall restrict ourselves to the stationary case, for 
which the expected yield density is constant over the area.) Moreover, it appears that this 
spatial correlation must often fall off relatively slowly with increasing distance between 
the two points; as a power function of the distance rather than as an exponential. 

The same behaviour is shown by observations on yarn diameter, flood height (Feller, 
1951), and response from population samples. The calculations of this article will apply to 
these cases, too, but for concreteness we shall continue to speak of plots and yields (although 
we shall sometimes use the word ‘region’ instead of ‘plot’, indicating that we do not confine 
ourselves to two dimensions). 

The type of calculation most usually made is to evaluate the yield variance for a plot of 
definite size and shape, and for a given spatial covariance function p(s). Hewever, the 
inverse calculation would probably be more useful in general: to determine the covariance 
function from a knowledge of yield variance as a function of plot geometry. Such a pro- 
cedure would enable one to make use of experimental results to obtain at least a partial 
estimate of p(s). We consider this question in § 3. 

A solution of one form of problem is most easily reached by using a Mellin transform, so 
that the covariance which falls off as a power of the distance, 


p(s)~C|s|~” (s large), (1) 
makes an unforced début. 

There is a good deal of evidence to show that many observed covariance functions are of 
the form (1) (Fairfield Smith, 1938). This is all the more interesting, in view of the fact that 
the simpler linear models of yield variation do not predict a power law, but rather amuch 
more quickly diminishing law of the type 


p(s)~C|s|~e-*!s! (8 large). (2) 


We shall pursue this question further in § 4. 

It is easy to show that the ‘yield’ variance of a large one-dimensional interval is 
asymptotically proportional to a power of the length of the interval if the covariance func- 
tion is of the form (1), but analogous results are not obtained quite so immediately in two 
or more dimensions. This topic is also covered in §§ 3 and 4. 


22 Biom. 43 











338 Variation of yield variance with plot size 


2. FoRMULAE FOR THE YIELD VARIANCE 


We shall use a vector co-ordinate x to denote points in the plane. Let us consider a plot, 
or region, Q, with area A and yield y(Q). (We use the words ‘plane’ and ‘area’, but most 
formulae of the paper apply to regions in n-dimensional space, n = 1, 2, 3, ....) 

Consider two regions Q, and Q, which contract to zero in such a way that ultimately they 
constitute two points a distance s apart. We shall define the covariance function as 


a cov [y(Q4), ¥(2z)] 
pe) = ee a (3) 





and shall restrict ourselves to cases for which this limit exists. 
If V(Q) is the population variance of y(Q), then it follows from definition (3) that 


V(Q) = [Je — Xq) dX, dX,. (4) 


If p(s) is continuous at the origin then we can draw one immediate conclusion from (4): 
V(Q)~p(0) A? (Q small), (5) 


that is, for small regions (i.e. for regions which are small in all their dimensions), V is pro- 
portional to the square of the area..This is to be contrasted with the case of zero spatial 
correlation, when V is proportional to area under all circumstances. 


For the one-dimensional case, when 2 consists of a segment of a line of length a, then 
(4) simplifies to 


V(a) = 2 {a — 2x) p(x) dx. (6) 


For the case when © consists of a rectangle in a plane, with sides a, b paraliel to the 
co-ordinate axes, we have similarly 


b fa 
(a,b) = 4) | (a2) (b—w)ple,y)dedy. (7) 


The integral (6) may be evaluated for the commoner choices of p(s), but (7) and other two- 


dimensional formulae can generally be evaluated only for rather unlikely forms of the 
function p(s). 


If we introduce the spectral density function 


F(w) = fr ef@-® o(x) dx (8) 
and the areal characteristic function 
G(w) = i eiw-X dx, (9) 
a 
then equation (4) can be rewritten as 
l ao 
=a 2 
VO) = aa | | @(w) PFw)der. (10) 


This formula sometimes has advantages over (4), since the limits of integration are simpler, 
the weights given to different wave numbers w are quite evident, and approximate methods 











ot, 
ost 


ley 


(3) 


the 


(7) 


Wwo- 
the 


(8) 


(9) 


(10) 


oler, 
10ds 








P. WHITTLE 339 


such as the method of steepest descents can be used immediately. The function G(w) is 
readily evaluated for simple plot shapes. Thus for the rectangle with sides a, b 
_ 4sin (aw) sin (bw2) 


G(w) (11) 


W 12 





and for a circle with radius a and centre at the origin 


G(w) = = Jaw), (12) 


where J, is the Bessel function of order 1. Here w, and w, are the components of w, and w 
is its absolute value. 


3. THE INVERSE PROBLEM 


From the practical point of view, it would be useful to be able to invert relation (4), and 
express the covariance function p(s) in terms of the yield variance V(Q), which can pre- 
sumably be observed for varying 2. 

This inversion is very simple for the special cases (5) and (6), corresponding to linear 
and rectangular plots. We have respectively 





10 
p(x) = 5 4d (13) 
1eV (x, 
ply) = 3 Gane (14) 


However, there are few cases as tractable as these. 
We shall now largely confine ourselves to one general type of case. We shall assume that 
the covariance is isotropic, so that p(s) is a function of s only, and we may write 


p(S) = p(s). (15) 


Further, we shall assume that we are dealing with a plot of constant shape, which can only 
vary similarly, i.e. by changing all its dimensions in the same ratio. For a given plot shape, 
the size of the plot is conveniently specified by its largest dimension, x (say), and we shall 


write 
V(Q) = V(x). (16) 


(Thus, for a circular plot x equals the diameter, for a rectangular plot x equals the diagonal, 
etc.) Now, corresponding to each plot shape there will be a function K(s) describing the 
distribution of distances between two points chosen randomly in the plot, such that 


1 
V(1) = [, K()pts)as. (17) 


. 


If we now increase all linear dimensions in the ratio x: 1, then the infinitesimal elements of 
integration will be increased in the ratio x”: 1, and we shall have quite generally 


V(x) = 2 | K(s)p(as) ds. (18) 
0 


22-2 











340 Variation of yield variance with plot size 


The form of the relation (18) between V and p invites us to introduce the Mellin transforms 


Vv) = | ” V(s)8"—1ds, (19) 
0 

pr) = | “pis) eds, (20) 

oe 1 

K(v) -{ K(s) s’—"ds, (21) 
0 


in terms of which (18) becomes 


V(v—2n) = K(1—v) p(v). (22) 

We shall consider in §4 the common range of v for which the transforms (19) and (20) 

exist, and (22) is valid. Assuming for the moment that a suitable range exists, we can 

immediately obtain a solution for p(s) by taking the inverse transform of the expression for 
p provided by (22). In fact, if the desired inverse relation to (18) is 


a 


p(x) = xe | L(s) V(xs) ds, (23) 
0 
then p(v) = L(1+ 2n—v) V(v—2n), (24) 
where L(v) has a similar definition to K(v). Comparing (22) and (24) we see that 
$ 1 
L(v) = =———-. 25 
We aoe (25) 


In order to be able to carry out the calculation explicitly we must then be able to 
perform the integration (21) and then invert the L(v) yielded by (25). One of the few cases 
for which this is possible is the rather unrealistic one of a circular plot (7 = diameter, n = 2). 
For the circle we obtain by direct methods 


K(s) = 3[s cos (s) —s?,/(1 —s?)], (26) 


zy  Fa)TEO+2) 

whence A”) = FHT G04 5)}’ 

- 4(v—3) T{4(v+1)} 

Lv) = a ooo 28 

= “Ta Tar—2)} ans 

However, upon attempting to invert L(v), we obtain a divergent integral. This may be taken 

as an indication of the fact that p(x) should be expressed not only in terms of V(x) but also 

of its derivatives V(x). In view of results (13) and (14) this is not surprising. If for (23) 
we substitute the more general relation 





(27) 


p(x) = rm > L,(s) ((xs)! V(as)] ds, (29) 
we find that, provided that acl 
[rti-1VHs))P =0 (7 = 0,1,2,...), (30) 
then L(v) = & (v—1) (v—2)... (v—j) £,(v). (31) 
j 


Now, the L(v) of (28) can be written 


m™B(5, *F) [~—1)— (7-1) @—2) + (P—1) 0-2) (v—3)], (32) 





nT 





(30) 
(31) 


(32) 





P. WHITTLE 341 
which, in view of (29) and the integral representation of the Beta-function, corresponds 
to the relation 

= 2a~* (" , 21" 3" sds 
Me) = — Ra V"(s) — (xs)? V"(as) + (ws)? V (79)) a): (33) 


A case of greater practical interest would be that of the rectangle. However, while K(s) 
is easily calculated for this figure, the transform K(v) is not. 


4. POWER-LAW COVARIANCES 


Let us first return to relation (22). If a function f(s) is O(s-*) and O(s~4) for smal! and large 
positive s respectively, then its Mellin transform f(v) will certainly exist for 


a<Re(v)<~f (34) 


and f(v) will in fact have simple poles at « and /. 

Now, p(s) is O(1) at the origin (if we suppose it continuous there), and if we assume it is 
O(s~*) for large v, then p(v) exists for 0< Re(v) <A. For small s, K(s) is proportional to the 
surface area of an n-dimensional sphere of radius s, and is hence O(s"—), while for s > 1 it is 
zero. Consequently K(v) exists for (1—n) < Re(v) <0. 

It thus follows that both quantities in the right-hand member of (22) exist for 


0 < Re (v) < min (n, A). (35) 


Since relation (22) determines V in terms of K and #, the relation is also valid for these 
values of v, and V(v) exists for 


— 2n< Re(v) <min(—n,A-—2n), (36) 


and has simple poles at these extreme values of p. 
We can thus conclude that for small plots 


V(x) = O(22") = O(A2), (37) 
while for large plots 
V(x) = O(a") = O(A), (38) 
if p(s) falls off more rapidly than s~”, or 
V(x) = O(x*") = O(A? “), (39) 


if p(s) falls off as s~“ (A <n). In other words, it is a power law [p(s) ~ Cs~”] which marks the 
transition between the two cases where V is proportional to plot area for large plots, or 
increases faster than plot area. 

Relation (22) may be used in reverse: if V(x) is observed to increase as A” for large areas 
(1</< 2), then we can deduce that p(s) falls off as s““—® for large distances. 

It should be emphasized that these conditions hold only for plots of constant shape, 
since V will in general be a function of shape as well as area. However, some results pre- 
sented by P. Fairfield Smith (1938) would seem to indicate that this dependence on shape 
may be weak, and that V may be determined almost entirely by plot area. This is presumably 
true only if the shape is not too extreme, e.g. not too narrow and elongated. 

Fairfield Smith’s results are also of interest in that they provide very convincing evidence 
of the behaviour associated with equation (39). He found that to a very good approximation 











342 Variation of yield variance with plot size 


the variance of yield per unit area in plots of area A could be represented as a curve 
const. A~®749, Replacing the index by — ?, we then have 


V(A)~ const. A?-i = const. Af, (40) 
corresponding to a covariance function 
p(s) ~ const. s-4. (41) 


(These equalities cannot, of course, hold for indefinitely small A and s.) 

This result, well founded in observation, provides evidence of three intriguing possi- 
bilities: (a) that covariances decaying as s~* do occur in nature; (b) that the rate of decay 
may be so small that yield variance increases faster than plot area; (c) that the observed 
index A may have simple rational values. 

None of the simple linear models hitherto proposed to represent yield variation over an 
area (see Whittle, 1954; Heine, 1955) lead to covariances of the above type. For instance, 
the model which relates yield density £(x, y) symmetrically to the yield at all points around 


(x,y): ae : 
(=) +(=) - «| E(x, y) =e(x, y) (42) 


leads to the covariance function 


p(s) = const. sK,(«s) 


~ const. ste-** (xs large). (43) 


(Here K, denotes a modified Bessel function.) 

For the one-dimensional problem of yarn diameter variation D. R. Cox (personal com- 
munication regarding unpublished work) has proposed a model which does in fact predict 
a power-law covariance for large s. The only multi-dimensional models known to the author 
which predict a power-law covariance for any range of s are those of turbulence (see 
Batchelor, 1953, p. 122). For these models dimensional arguments show that the index A 
must have simple rational values. 

It seems likely to the author that any model which is to provide a satisfactory explanation 
of the power-law decay observed in agricultural work must embody two of the features 
common to Cox’s yarn model and the turbulence model: (a) it must be non-linear, and (bd) it 
must consider the variate (yield density) to be a function of time as well as of the spatial 
co-ordinates. (In Cox’s case ‘time’ is discrete, the number of smoothings that the yarn has 
undergone.) As an example of such a model, one could suppose that fertility gradients in 
the soil tend to be smoothed out in the course of time by a diffusion process, which is non- 
linear to the extent that only gradients which are greater than a certain value tend to 
diminish. 

The value of the index A deserves some discussion. Is it fortuitous that Fairfield Smith’s 
estimated index b = 0-749 lies so near the rational value } (corresponding to A = 3)? Certainly 
little can be inferred from this single instance. However, Fairfield Smith lists the b values 
estimated from uniformity data for several different kinds of crop. The histogram of the 
thirty-nine estimated b values has its main peak in the interval b = 0-41—0-50, and a second 
peak and upper cut-off point at b = 0-71—0-80. This provides at least a suggestion that the 





s 





rve 


40) 


41) 


ssi- 
cay 
ved 


an 


ice, 
ind 


(42) 





P. WHITTLE 343 


values b = } and } (corresponding to A = 1, 3) are in some way distinguished. Perhaps one 
should say no more until further data or predictions from theory are available. 


The author is grateful to Dr D. R. Cox for drawing his attention to the observations on 
yarn diameter and flood height. 


REFERENCES 


BatcHetor, G. K. (1953). The Theory of Homogeneous Turbulence. Cambridge University Press. 

FartrRFIeELD Smits, H. (1938). J. Agric. Sci. 28, 1-23. 

FELLER, W. (1951). The asymptotic distribution of the range of independent random variables. Ann. 
Math. Statist. 22, 427-32. 

HEINE, V. (1955). Models for two-dimensional stationary stochastic processes. Biometrika, 42, 170-8. 

WauirTTteE, P. (1954). On stationary processes in the plane. Biometrika, 41, 434-49. 











[ 344 ] 


ON THE CONSTRUCTION OF SIGNIFICANCE TESTS ON THE 
CIRCLE AND THE SPHERE 


By G. 8S. WATSON* anv E. J. WILLIAMS} 


1. IntTRODUCTION 


A number of recent papers have dealt with the probability density, in two and three 
dimensions, proportional to exp (x cos), 
where x is a precision constant, and 6 is the angle between an observed unit vector and the 
population mean unit vector or polar vector. The purposes of these investigations have 
been (i) to derive, from observed results, limits within which the unknown polar vector is 
likely to lie, and (ii) to test the homogeneity of different sets of observations, both in their 
precision and in their direction. These distributions and the tests associated with them thus 
produce the circular and spherical analogues of the usual tests in Euclidean space, based 
on the normal distribution. 

For example, in palaeo-magnetism, a sample of rock specimens is collected from a site 
and the direction of remanent magnetism of each specimen is measured. The directions 
may be loosely or closely grouped about a mean direction. It has been shown (Irving & 
Watson, 1956) that the above distribution is a good fit in the case of stably magnetized 
rocks, for both large and small dispersions. From the knowledge of the mean direction of 
magnetization, the geophysicist hopes to learn about the earth’s magnetic field at the time 
the rocks were magnetized. Such knowledge is being used to throw light on the stability 
or otherwise of the earth’s magnetic axis and on the positions of the continents in different 
geological epochs. 

Fisher (1953) considered the three-dimensional case where the observations may be 
regarded as points on a sphere. He derived the maximum-likelihood estimates of x and the 
polar direction and provided the basic distribution theory which he used to find the fiducial 
test of a prescribed polar direction when x is unknown. Watson (1956 a, b) gave a significance 
table for the test of k= 0, and approximate tests for the equality of the x’s and polar direc- 
tions for any number of populations. 

Gumbel, Greenwood & Durand (1953) and Greenwood & Durand (1955) studied the two- 
dimensional case where the observations may be regarded as points on the unit circle. 
They gave a table to facilitate the calculation of the maximum-likelihood estimate of x 
and a significance table for testing x=0. For this case, less is known of the distribution 
theory. Pearson (1906), Kluyver (1906) and Rayleigh (1919) gave results on the distribution 
of the length of the resultant vector of a sample when x = 0. Greenwood & Durand gave some 
further results when x + 0. 

In the present paper, the fundamental property of sufficient statistics will be used to 
derive tests, free of nuisance parameters, for direction and homogeneity in both the two- 
and three-dimensional cases. In only one situation, however, can we give the required 
exact distribution. Inequalities are suggested for some of the tests proposed; and an 
arithmetical example suggests that these may be useful in practical applications. 


* The Australian National University, Canberra, A.C.T. + C.S.I.R.O., Melbourne. 





th 
bt 


oc 


an 


H 
ok 
fo 


an 


ar 





an 





G. S. WaTSoN AND E. J. WILLIAMS 345 


2. Basic RESULTS 
We write the density function as 


C,(k) ek Cos 0: (1) 
with p the number of dimensions and C,,(x) the corresponding constant factor. In general, 
the range of 0 is 0<0<7, 


but in the two-dimensional case it is conventional (though not necessary) to take 
0<0<2z. 
The constant factor is, in general, the reciprocal of 
mP—1) ™ 
P(R(p—1) Jo 
which may be expressed in terms of the imaginary Bessel function T,y-1(k) or (when p is 
odd) in terms of sinh x and cosh x. When p= 2 the factor must be halved. Then 


sin?—? 6 ex ©°8 9 dg, (2) 


C(x) = 1/(2a1h(x)) (3) 
and C,(k) = x/(47 sinh). (4) 
The probability density of a sample of N is a function only of 
N 
Y cosé; = X. 
i=1 


Hence, when the polar vector is assumed known, so that the 0; can be determined from the 
observed vectors, X is a sufficient statistic for x, the maximum-likelihood (ML) equations 





for x being ’ xX 
of) = (p=2) (5) 

I(x) N 

1 xX 
and coth x —— 7 (p=83). (6) 


When the polar vector is not known, its MZ estimator is, for all values of », the set of 
direction cosines of the vector resultant of the sample of N unit vectors; the ML equations 
for x are (5) and (6) (in the cases p = 2 or 3) with X replaced by R, the length of the resultant. 

If c is the cosine of the angle between the resultant and polar vectors, then X = Rc; R is 
determinable but not c unless the polar vector is known. From the form of the distribution 
itis seen that Rc is a sufficient statistic for «. Hence the joint probability density of R and c, 
when x + 0, is found by multiplying the joint probability density, when x = 0, by the factor 


[c,( K) JN exe, 


We therefore first corider the uniform distribution given by kx=0. When x=0, R and c 
are independently distributed. 
In two dimensions, the density of R may be expressed as an integral (Kluyver, 1906) 


R | ” (Jolt) }¥ Jol Rt) tdt, (7) 
0 


or as an asymptotic series (Pearson, 1906). These representations are discussed by Green- 
wood & Durand (1955). Hence, in this case, the joint density of R and c is given by 
doe R| Jo ty] Jo( Rt) td\ aR —2 
Lo(«)¥ J, ’ F m(1—c®)t” 


which is the same as formula (3-6) of Greenwood & Durand. 


(8) 











346 Significance tests on the circle and the sphere 


In three dimensions, Fisher gives the joint density 





K N 
( stat ese .(R) RdRde, (9) 
2 (N 
where by(R) = ay & (5) (— VR 20h, 


with the notation (x) = x if x>0, (a) = Oifx<0. 

It follows from these results, or can be derived directly, that the density of X = Re in 
two dimensions is ex - 
——_—_——_ J,(t)} cos Xtdt, 10 
air), 0) (10) 


and in three dimensions is 


Kc \N ¢X © /N re 
(saan) a (5)(-» (N — X— 28)". (11) 


For confidence limits on x, when the polar directions are known, and for tests of a prescribed 
polar vector, the probability densities (10) and (11) suffice, since in either case the null 
hypothesis specifies c so that X is determinable. 

To test the homogeneity of the values of « for several sets of results, or to compare polar 
vectors, the same value of x being assumed, we need to work with the sample resultants. 
In such cases the joint distribution of the sample resultants and their overall resultant is 
required. We suppose that there are samples of sizes N,, N,,... with resultants of length 
R,, R,, ... and that the overall resultant has length R. 

For resultants in two dimensions, the joint distribution is not easy to determine. We give 
here the results only for two samples. If the integral in (8) is denoted by yy(R), then the 
joint density of R,, c,, R, and c, from two independent samples is 


exp {«(Ryc, + Ryco)} R, Wy (R,) Ry Yn, (Bo) 
[Up(K) 2% m{(1 —c?) (1—c3)}# 

We put R,c,+ Rc, = Re. Then the joint distribution of R,, R,, R and c may be found by 
the following simple geometrical argument. 

Since Re is sufficient for x, the factor introducing x may be ignored in the derivation, 
and brought back into the final result. Thus k may be assumed zero, so that the angles are 
uniformly distributed independently of the R’s. 

R? ranges from (Rf, — R,)? to (R, + R,)?, being given by 


R? = (R, — R,)? cos?A + (R, + R,)* sin? A, 


where A is half the angle between the two vectors. Also, since A is uniformly distributed 
from 0 to 7, we may determine the conditional distribution of R given R,, Ry. 














(12) 


For RdR = 2R, Ry sin 2dd, 
while sin 2A = {[(R, + R,)* — R?] [R?—(R, — R,)?}}4/(2R, R,), 
so that RdR = {{(R, + R,)?— R*] [R*-—(R, — R,)?}}4 dA. 
Hence the elementary conditional probability is 


dA ‘ ee 
m m{((R, + R,)? — R*](R?—(R, — R,)?]}4 
= G(R,,R,,R)dR (say). 








Th 


B 
te 


ao = 





(9) 


Re in 


(10) 


(11) 


ribed 
null 


polar 
ants. 
int is 
ngth 
give 
n the 

(12) 
id by 


tion, 
Ss are 


uted 








G. S. Watson anv E. J. WILLIAMS 347 
The joint probability density of R,, R,, R and c (still with x = 0) is therefore 


Ry Wy, (By) Rey, (Re) G(R, Re, R) 
n(1—c?)3 ‘ 





(13) 


Bringing in the factor introducing x, we find the non-null joint density to be (NV = XN,) 
eRe Ry ry, (Ry) Roy, (Re) G(R, Re, R) 











14 
[(«)]” m(1—c?)é “ 
Finally, integrating out c, we have for the joint density of R,, R, and & 
L(«R . , 
i Bara Rs) Raviao( Ba) (Ry, Bey R. (15) 
The density of R is found, by integrating (8) over all values of c, as 
I(kR) 
Ryy(R). 16 
[Ip(K)” Yn( ) ( ) 
Hence the conditional density of R, and R,, aes R, is 
Rix = 


Being independent of «x, this conditional density provides the basis for exact significance 
tests. So far it has not been possible to generalize this result to more than two samples. 

For resultants in three dimensions, Fisher (1953) has shown that, for any number of 
samples, the joint-probability density of the R; and R is 


K N2QsinhkR 
i sinh x 





Igy(R,), (18) 


K 
while the density of R is 


2sinh«R 
(sate) date (19) 


so that the conditional density of the R;, given R, is 
Ndy,(R,)/y(#). (20) 


This conditional density is the basis for any exact significance tests. 


3. TESTS OF SIGNIFICANCE 


The required significance tests for data following circular or spherical distributions have 
been discussed in § 1. 

When x is large we may make use of the fact that 2«(1— cos @) is distributed approximately 
as x? with p—1 degrees of freedom. Hence, we have approximately 


2x(N —-X) = Xtp—vN> } 
2x(N —R) = XP tp>—tw-»- 


We may therefore take N —ZR; and =R;— R to be independently distributed as yx? with 
(p—1)(N—gq) and (p—1)(q—1) degrees of freedom respectively, where q is the number of 
samples. 


(21) 











348 Significance tests on the circle and the sphere 


The tests based on these results were discussed for the spherical case by Watson (1956a). 
” ney will certainly be accurate when x is large, but seem to be accurate also for a wide range 
of x and NW. As is usual in problems of this kind, it is hard to assess the degree of approxima- 
tion of (21). In §4 some more results will be given to justify the tests derived from (21). 
To test a given value of x or to derive confidence limits for x, we should use 


2x(N —X), when the polar vector is known, or 


2x(N—), when the polar vector is unknown. 


Likewise, to test the homogeneity of «-values for different samples, we should compare 
corresponding values of N —X or N—R. 

To test the equality of polar vectors for different samples, we should use, generalizing 
Watson (1956a), : 

wae (W—4) (2R,—R) om 
(q—1)(N-ZR,)’ 

which is distributed as F with (py —1)(q—1) and (p—1) (N —q) degrees of freedom. 

The distributions of all the test statistics given above carry x as a nuisance parameter. 
The tests given below do not, but the required distributions, with one exception, are not 
known. However, some inequalities are given which may help in their application. 


(a) Tests of hypotheses concerning k’s 

It has been seen in §2 that when the polar vector is known, X = cos @ is a sufficient 

statistic for x. To test that several populations with known polar vectors have a common 

value of x, suppose samples of N,, N2, ... with values X,, X4, ... are available. Then the joint 

density of X,, X,,..., given the value X of 2X;, may be written as 
Ay,(X1) Ay,(X2)--. 


Ay(X) 





(23) 


where A,(X) stands for the factor of (10) or (11) independent of «x. The distribution of any 
test function found from the density (23) will be independent of «. However, as the case of 
a known polar vector is of no practical interest, it will not be pursued further. 

When the polar vectors are unknown, the test of the equality of « in several populations 
must be based on the resultants R,, R,, ... of the several samples, and the overall resultant R. 
Formulae (17) or (20) show that the joint density of R,, R,, ... given R is independent of x, 
so that an exact test can be made. When x is large, or when the N; are all large, the test 
function must be sensitive to deviations of the ratios R,/N; from equality. Under these 


conditions, the tests based on (21) are likely to be adequate. Nothing is known of suitable 
tests when both « and the N; are small. 


(b) Tests of hypotheses concerning polar vectors 


(i) A single prescribed polar vector. If k is known, a test of a prescribed polar vector may 
be made most conveniently by using the density of X ((10) or (11)). In practice « is not 
known. To derive an analogue to the single-sample Student t-test, we consider the density 
of R given X. Since the density of X is an even function of X, these are strictly tests of 


a prescribed axis. In practice, however, the obs xvations will usually be so grouped that 
no confusion will arise. 














sil 





a). 
age 
ne- 


are 
ing 


22) 


ter. 
not 


ent 
non 
int 


(23) 


any 
e of 


ons 
t R. 
f K, 
test 
1ese 
ible 


nay 
not 
sity 
s of 
hat 














G. S. Watson anv E. J. WILLIAMS 349 
In two dimensions, this is found from (7) and (10) to be 
R } * tot) 8 Jo(Rt) tat 


a (24) 
| [Jo(t)¥ cos Xtdt (R? — X2)3 
0 





where R> X. If the sample mean direction is far from the prescribed direction, R will be 
much greater than X. Thus to define a possible significance test, we find a value of R, Ry 
say, so that P(R> Ry|X) = a, 

i.e. so that } sil R 


d (R?_x2)i dR { ' [Jo(t)h Jo( Rt) tdt = a |’ [Jo(t) } cos Xtdt. (25) 


This is difficult to solve for Ry. Since, however, the integral of (24) from X to N is unity, 
we have the relation 
R 


bad r N - , \ 
I, [Jo(t)}* cos R,tdt => 1 (R2— Rk ar | [Jo(t) 9 J,( Rt} tdt 


N R oe) 
> ——__— qR J,(t) | J,( Rt) tdt, 
Rae [ean soar 
since R> R,> X. Thus, in two dimensions, 


[ [Jy(t)]¥ cos Rytdt 
oa (26) 
| [Jo(t) }¥ cos Xtdt 

0 





P(R> R,|X)< 


The functions on the right-hand side of (26) could, with some difficulty, be evaluated 
numerically for various R, X and N or, more approximately, be replaced by their saddle- 
point approximations or by their approximations when R and X are large, i.e. near N. 

In three dimensions, we have instead of (24) 


Py(R) 








(27) 
1 o /N 
aA, (— 1) (N —X— 28)" 
so that, by an easy calculation, 
> (") (—1)*<N —R,— 28) 
P(R> R,| X) = aoe sie 
. (;) (—1)<N-X— 28)" 
s=0 


It will be seen that the right-hand sides of (26) and (28) have the same form, i.e. the ratio 
of the density of X evaluated at R and at X. Furthermore, the ratio of the leading terms 


in (28), N—R\X-1 N-R\--1 
(; =) Ky (=a) 


is Fisher’s first approximation to the probability of a cosine less than c, calculated fiducially. 
In computing (28), this factor is best taken separately in order to reduce the number of 
working figures. 











350 Significance tests on the circle and the sphere 


(ii) Comparison of several polar vectors. 1t will often be required to test the equality of 
the polar vectors of several populations. To devise a possible test statistic, we observe 
that, if the sample resultants have very similar orientations, R, + R,+... will not be very 
much greater than R, while if the orientations are not alike R, +R, +... will be much larger 
than R. Since the density of R, + R, +... given Ris free of x, it provides a possible exact test. 

In the case of two populations in two dimensions, the joint density of R, and R,, given 
R, is given by (17), where the region of joint variation of R,, R, and R is defined by 


O0< Rk, <M, 0< Rk, <N,, aaa 
| R,-R,|<R<R,+R,. 
We require a value Ry such that 
P(R, +R, > Ro| R) = a. (30) 
As with equation (25), it seems difficult to solve for Ry, but an inequality is easily obtained 
by the method used to derive (26). Thus 
Ryyry(Bo) 
P(R, + R,>Ry| R)< 25". 31 
( 1 2 0 | ) Ryy(R) ( ) 
For application of (31), the function yy must be found. Use may be made of the fact that 
yy(R) is asymptotically 2exp (— R?/N)/N. 
An alternative test is given by the function 
gy a + Ba) — Bi] LRP — (2, — By) r 
N(N—1)(N -2) "(xa Rey NER? (RE RB] (32) 
which is distributed approximately as x? with 1 degree of freedom. This tests equality or 
oppositeness of direction; that is, departures of R either from R, + R, or from | Rk, — R, |. 
For several samples in three dimensions, the joint density of R,, R,,... given R is given 
by (20). Once again, only an inequality is easily found. This is that 


(29) 











y(Ro) 
P(=R;>R,| R)< bx(R)” (33) 
Here the function ¢,(F) is easy to compute in any application. Actually there are cases 
where (33) is an equality. For example, with two samples only, the domain of R, and R, 
given R is (29). A little consideration shows that there will be equality in (33) if R > max (N;) 
or if Ry + R > max (2N;). 


4. NUMERICAL APPLICATION AND COMPARISON OF SIGNIFICANCE TESTS 


In this section, we apply some of the tests suggested here to the numerical examples given 
by Fisher (1953) and Watson (1956a) and compare the results with those obtained in these 
papers. The agreement is remarkable, and, since Watson’: tests are the easiest to apply, 
it is suggested that they may be used with confidence despite their approximate derivation. 

To begin with we consider Fisher’s example (a). It is vequired to find the 5% zone of 
confidence for the polar direction. A sample of N =9 is availabie and R=8-77203. Fisher 
uses his approximate form for the probability that the cosine of the angle between the 
resultant and the polar vector is less than c, 


(N—R N-1 
P= (w=) ’ (34) 





th 
ge 


N 


ea -« Try) 





ty of 
serve 
very 
arger 
test. 
riven 


(29) 


(30) 


ined 


(31) 


that 


(32) 


ty or 


ziven 


(33) 


cases 
id R, 
x (N;) 


yiven 
these 
pply, 
ition. 
ne of 
‘isher 
n the 


(34) 





G. S. Watson AND E. J. WrLLiaAMs 351 


with P=0-05. He finds c= 0-98820 so that the confidence zone includes only directions less 
than 8-8° away from the resultant direction. We have shown that Fisher’s solution is, in 
general, a first approximation to our exact expression (28). Here, in fact, they coincide 
as R is so close to NV. So Fisher’s solution is also that obtained from (28). Watson’s approxi- 
mate result in this case is that 

—_ ad x Fy 2(N-1) (35) 





(N—1) 


which is equivalent to Fisher’s result (34). Incidentally, the estimate of « from these data 
is 35-1. 

In Fisher’s example (b), a sample of 45 more widely dispersed (R =7-3) directions is 
examined. Since R= 38-9946, Fisher finds that the 5° zone of confidence is limited by 
c=0-98915, ie. 0=8-45°, by his formula given above. Taking F, x, (5%) = 3-10, formula 
(35) gives an identical value of c. The extra terms in (28) and also the extra terms in Fisher’s 
more complicated formula for P make no difference to the value of c. Thus the approxima- 
tion (34) is entirely satisfactory for small values of k and medium values of NV. 

Needless to say, had the various formulae been used for a test of significance of a given 
polar direction on the above data, similar agreements would have been found. 

Watson gives an example of a test that the polar directions of three populations are 
identical. In the data analysed 


Sample 1 Sample 2 Sample 3 
N; 10 11 15 
R; 6-990 8-212 12-194 


while for the combined sample N = 36, R = 26-902. In this situation, Watson proposed the 
approximation (N—3)5R,—-R 
F, 3 = = . 36 
4, 2(N-3) 2 | as =R; ( ) 





The right-hand side is found to be 1-81. Graphical interpolation in the F-tables show that 
F =1-81 corresponds to a probability of about 13 %. The inequalty (33) of this paper may be 
applied to this problem by taking 

R, = =R;, = 27:396, R = 26-902. 
A 
= N)\ /N — R—2s\X* 
> (-1 a 
3-0 (.)w-e 
= 0-1499 x 1-0034 
= 0-1504. 


Now 








oR)  (N- a = 


nl 
dy(R) \N-R a 


Since 0-15 is only slightly greater than 0-13, it seems likely that very little is lost in forming 
the inequality (33) (i.e. it is close to equality) in practice and that the F-approximation is 
again accurate. 

Finally, we notice in this case that the values of N and & in each sample are both small. 
Furthermore, the value of ¢y(Ro)/dy(R) is accurately given by the leading term so that 
one might suggest the simple approximation 





ae _ 


P(SR,> Bo|B)= (3 (37) 











352 Significance tests on the circle and the sphere 


Assuming that the approximations (21) are adequate for testing purposes, the relation (37) 
has the circular analogues 








_ R\4N-1 
P(R> Ry| X)% (5 =) (38) 
— R,\iW-)-1 
and P(=R,> Ry| R)~ (F =) (39) 


The merits of these approximations are unknown. 


REFERENCES 


FIsHER, R. A. (1953). Dispersion on a sphere. Proc. Roy. Soc. A, 217, 295. 

GREENWOOD, J. A. & Duranp, D. (1955). The distribution of length and components of the sum of 
random unit vectors. Ann. Math. Statist. 26, 233. 

GuMBEL, E.J., GREENWooD, J.A. & Duranp, D. (1953). Thecircular normal distribution: theory and 
tables. J. Amer. Statist. Ass. 48, 131. 

Irvine, E. & Watson, G. 8. (1956). The use of statistics in studies of the magnetic directions in rocks 
(unpublishea). 

Kuvuyver, J. C. (1906). A local probability problem. Proc. Acad. Sci. Amst. 8, 341. 

Prearson, K. (1906). A mathematical theory of random migration. Drap. Co. Res. Mem. no. 3. 

Strutt, J. W. (Lorp RaytereH) (1919). On the problem of random vibrations and random flights in 
one, two and three dimensions. Phil. Mag. (6), 37, 321. 

Watson, G. 8. (1956a). Analysis of dispersion on a sphere. Mon. Not. R. Astr. Soc. Geophys. 
Suppl. 7, 153. 

Watson, G. 8S. (19566). A test for randomness of directions. Mon. Not. R. Astr. Soc. Geophys. 
Suppl. 7, 164. 





vy 


1 





(37) 


(38) 


(39) 


im of 
y and 


rocks 


ots in 
phys. 


phys. 





[ 353 ] 


NOTES ON BIAS IN ESTIMATION 


By M. H. QUENOUILLE 


Research Techniques Unit, London School of Economics and Political Science 


1. One of the commonest problems in statistics is, given a series of observations 


%1,Xy,...,%,, to find a function of these, t,,(%,, 2%, ...,%,,), Which should provide an estimate 
of an unknown parameter 6. 

The desirable properties of estimation procedures have been discussed fully elsewhere. 
They are: 

(a) That the estimator should be efficient according to some definition of efficiency pre- 
viously arranged. Most commonly, the reciprocal of the variance of the estimates is taken 
as a measure of its efficiency, as this is most useful where central limit theory may be 
relevant. 

(b) That the estimator should utilize all the information contained in the observations, 
Ly, Lg, ...,%, concerning the parameter 0. This is not always possible, but, if such an estimator 
exists, it is called sufficient. 


(c) That the estimator should be consistent, i.e. t,, converges in some probabilistic sense 
to 0, usually lim t,, ->0. 


n—->o 
(d) That the estimator should be unbiased, i.e. H(t,,) = 0. 
The method of maximum likelihood is popular in that it satisfies properties (a) to (c), 
whence, by evaluating H(t,), an unbiased statistic may be derived. That such evaluation 
is necessary is obvious when it is remembered that y(t,,) is, by the same theory, the estimator 


of 7(@), and, since in general E[y(t,,)]+ ~[E(t,)], it will be the exception rather than the 
rule for a maximum-likelihood estimator to be unbiased. 


Provided the exceptions may be simply evaluated no real difficulty arises. However, 


often the complexity of the evaluation presents a major drawback and some simple approach 
is then desirable. 


2. If the observations are taken in random order, the estimator ¢,, may often be written 


t, => t, (ky, ke, ees km)» 
where k,, kg, ..., k,, are unbiased estimates of the cumulants k,, Ko, ..., K,. Then, provided that 
a) mis independent of n, 


b) the function ¢, is capable of Taylorian expansion, 


( 
( 
( 
( 


I 
c) all of the cumulants are finite, () 
d) t, is consistent, i.e. 0 = limé,(k,... k,,), J 
no 


it follows that 
+11, Ot, > ? On 
t,— d= X(k; — K(f), + Lu(k; —K;) (k; —_ nt), - 


Since the moments of the estimators, /;, are power series in 1/n, it follows that H(t, —9), 
i.e. the bias in ¢,,, is also expressible as a power series in 1/n. 


23 Biom. 43 











354 Notes on bias in estimation 


The conditions (I) are undoubtedly more stringent than they need be. For instance, the 
higher cumulants need not exist. Further, it appears likely that I(b) is a necessary con- 
dition if I (d) is to hold. However, the main point is that for a wide variety of statistics it 
is true that 


and hence t;, is biased to order 1/n? only. Similarly, t?, = [n*t),—(n—1)*t}_,]/[n?—(n—1)?] 
is biased to order 1/n’, and so on. 

Alternatively, it is possible to use the statistics calculated from any subset of the obser- 
vations to achieve corrections for bias. A further approach of particular interest occurs 
when n = 2p. Here, we may use ¢;,, = 2t,,, —t, as being free from bias to order 1/n?. 

Procedures such as these may supply approximate corrections for bias provided that 
efficiency of estimation is not lost in the process. To achieve this in general, it is necessary 
to use Z,,_,, the average of estimates from all possible sets of n— 1 observations, instead of 
t, 1, and similarly /,,_, instead of t,,_,, ete. With this provision, it appears likely that little, 
if any, loss of efficiency will result. 


4, For instance, many of the statistics ¢,, may be derived from an estimation procedure 
of form 


> G(x;,t,) = 0. 
i=1 


The variance of t, to the first order may be estimated from this equation by a é technique 
such as has been described by Weatherburn (1952, pp. 130etseq.) and Kendall (1943, 
pp. 208 et seq.). In the simplest instance it is possible to represent the argument as follows. 
If « = E(x,), then m 
not, = H (9) & 82, 


where 


H0) = B= a(u,0)| [EB 56,.0)], 


if both expectations exist. (If they do not exist, then generally the basic equation is changed 


to one of the form ee 
not, = > dH(x,9), 
i=1 


to which a similar argument may be applied.) 


[HOY 


Thus vart, = — var 2, 


’ 


(n—1) dt, _, = H(A) ¥ dx, 
i+j 


(n—1)81,_. = H(6)— 3 Sdx, 


j=1i+j 


n—1” 
= H(9)—— 2X oe; 
t= 








», the 
con- 
ics it 


-1)"] 


bser- 
curs 


that 
sary 
ud of 
ittle, 


dure 


Lique 
1943, 
lows. 


nged 





M. H. QuENOUILLE 355 


Hence ot), = not, —(n—1) dt,_, 
_ AO) 2 
” tor Ph 
2 
and vart), = OF vara = vart, 


to order 1/n. 
This implies that this correction affects the standard error by a factor of 1/n at most, i.e. 


s.E. of t), = (s.E. of t,) {1+ O(1/n)}. 


Since, in general, the standard error of t,, will decrease with n (usually proportional to n-), 
the correction will affect the dispersion of the distribution by o(1/n) (usually, O(n-4), 
i.e. by a small amount in comparison with the bias). The reduction in bias achieved by using 
t, is consequently not obtained at the expense of a comparable increase in the dispersion 


of the distribution of the estimator. 
n 
5. As a first illustration, suppose 0 = o? for a normal distribution, and ¢,, = ¥ (x;—%)?/n. 


h P 5 i=1 
—_ t!, = nt, —(n—1)i 


n—-1 





oe 
MED (2; — 25)? n LL (%—2,) 
= .. aS _n-1 > i<j 
n n K=1 (n—1)? 


yell 22 
Be ee 4) 2 | 


n(n —1) 


1 
1 (%;,— 2 
fn — 1) Be 4) 
1 n 
"sn 
and P 
? n 
vart, -(-4) vart,. 


Similarly, if, = 2t., —t,, then 





, 1 &% m\2 
top = pai,” 
and 2p \? 20% 
var ty, = (— i) vart,, = 3-1" 


If, alternatively, ¢,, is calculated from only one pair of possible sets of p observations, say 
#, to x, and x,,, to %,, then 
o-bEe-BlEe)(B 
=_ win-- x; x: 
* ea + (>, i<p+1 
and 


1 
vart,, = at 
Lr 


Thus averaging over only one pair of the possible sets results in a decrease in the efficiency 


of estimation of 
(2p-1)(p+1)_, _p-} 
2p? 2p? * 


23-2 











356 Notes on bias in estimation 


6. As a numerical illustration, suppose that it is desired to estimate 0 = 1/~ from a 
series of observations taken from a normal distribution. Then 


n 
t, =n] >d 2; 
i=1 





and i’ n> (n—1? 2 1 
eo wae rey? 
x %; i+i 
i=1 


vart, ~vart!~ 
n n nut # 
The values in column 1 of Table 1 were random observations from a normal distribution 
with « = 2,07 = 1 (using the first ten random numbers in Fisher & Yates’s (1953) Statistical 
Tables). 




















Table 1 
x; 18-32—a;, tn, = 9/(18°32 —2,) 
Necieieaeinieeaiastcaemes a ae ee snceseniamiaaie 
| 
0-18 18-14 04961 
4-00 14-32 0-6285 
1-04 | 17-28 0:5208 
0-85 | 17-47 0-5152 
2-14 16-18 0-5562 
1-01 17°31 0-5199 
3-01 15-31 0-5879 
2-33 15-99 0-5629 
1-57 16-75 0-5373 
2-19 16-13 0-5580 
| 
| 
= —}— 
18-32 164-88 = 9 x 18-32 65-4828 
Then t,, = 1/1-832 = 0-54585, 
t}, = 5-4585 —9 x 0-54828 = 0-5240. 


Here, owing to the high value of s? (= 1-4), this latter value has been corrected more than 
it would be using the exact formula 


n nue’ fe ny 3 
K(t,) = a exp ( = a) {' exp (4) du (see appendix) 





=-+—,+... for large yp. 


— M1 _[,, 2-7) eZ? 2 
t-1 T —x; ra 7'| (n— 1)T (n—1)? pet . |. where T = 2 
i n n?s® ie 
n—-1 Tt (—1)T? sees 

n ns? 1 


nt, —(n— 1)t,_4 = a" = a a 





Si 


is 





ma 


ition 
stical 


than 





M. H. QuENOUILLE 357 


In this instance it might be noted that the procedure will probably break down if ny?/a? 
is small. This should be apparent from the behaviour of the ¢,,_,, which will vary in sign. 


7. Consider next an inverse sampling procedure. Suppose the proportion, 7, of indi- 
viduals with a given characteristic is to be estimated, and sampling continues until r 
individuals with the characteristic are observed. Let n be the total number of individuals. 

Let t,, = r/n, then since the last individual is constrained to have the characteristic, there 
are only n — 1 values of t,,_, to be considered, and 


nr—2r+1 


1 —l P 
t= at |(r- 1) —— + (n—r) al i (n— 1)? 


Thus 
mr — 2r + 1 r—1 


ye ae ce: 
sae ia ie (n—1)? n—1’ 


which actually is strictly unbiased. Alternatively, if 1/7 is to be estimated, then t,, = n/r, and 


Thus t), = t,, = n/r, which again is strictly unbiased. 
This indicates that the procedure may be useful in sequential estimation or inverse 
sampling. 


8. It is possible to use simple extensions of this procedure to correct the bias in any 
combination of statistics, f(t,,, u,,). The statistic 


nf (t n? Um) — (n —%) )mf(t ‘n—1 “,,)— n(m - 1) f(t Ln2 Um- i) +(n—- 1) (m —1)f (tra %m—1) 
is, for example, unbiased to order 1/n? or 1/m?, whichever is the greater. 
9. In general, where a series of concomitant observations, ¥;,%,...,Y,, are used in 
calculating ¢,,, both the bias and the efficiency will depend upon these observations, and 


therefore a simple correction of the above type will not be possible. 
An important exception occurs when 


Tee) 
t, = yi — ie’). (II) 
. v( TP*(y,) 

In this instance, if 


E= Ex PyNEPWI ot) = [WEE vare/E PV) 


and n—1 -_ 1 
o*(t,,) i o*(t,1) 
Then, if t), = nt, —(n—1)#,_,, where 





E, -|z t,— 1 [aca > | 
—1 ~~ ? 
a(t ty 1) Ja 1) 


t), is unbiased to order 1/n?, and vart,, = vart}, to order 1/n?. This formula may thus be put 
in the alternative forms o%(t,,) 


o*(t,_ 1) vis 


= nt, — x(1 a ee at 


t, = nt, — 

















358 Notes on bias in estimation 


In a similar manner, if 





sco yi) ae 


eee 1 








— o%t,) mn ~ o%(t,, a) 
and t, = (n—1)t, —(n—2)t,_1, 
. t 1 
where i, =|3—— \/\: ; | 
, o*(tn_4) o*(t,_1) 


is unbiased to order 1/n? 
It is also possible to split 2p observations into groups of p. Thus, if equation (IT) holds, 
1 1 1 


(ls) Olly.) Olly)” 








and ad —t 


= 2 t 1 
‘> Where ¢, = (= at) /(2 5) 
is unbiased to order 1/n? and has equal asymptotic efficiency to ¢,, 

If, however, a correction is made for the mean, some loss in efficiency will arise from the 
difference between the values of ¢(y;) for the two groups. If these are equal, the above 
approach may be used with no extra loss in efficiency, and, since there is no ordering 
involved in statistics of this type, approximate — e these values may often be 
achieved by appropriate selection. For instance, if the d(y;) are ordered and the pairs of 
observations corresponding to alternate values are used i in the two groups, then both 
X(y;) and XG*(y;) will be ayproximately equal for the two groups and é;,, = 2t,, — }(t,1 +t) 
will frequently be a sufficiently accurate and unbiased statistic. 

None of these resu!ts is, however, of much practical importance in that the bias may be 
calculated directly using ~’(£) 7(t,,)/2[w’(&)]®. Their interest lies in that they indicate that 
the same corrections applied to time-series statistics may be adequate for many purposes. 
This has already been suggested elsewhere (Quenouille, 1949). 








—— 


10. Consider first the application of these corrections in the estimation of a serial co- 
variance, say the pth. Here, if the mean is known, say O, an unbiased estimator exists: 


ty = (x Lpiit HX Lpigt +. -F®y_ —p Lpy)|(n— Pp); 
with variance o*/(n —p) in the case where no correlation exists in the series, and variance 
o4A/(n—p) otherwise, where A depends upon the correlations between the products in 
this expression. 

There exist also n estimators based upon n—1 observations. Denoting the estimator 
which omits the ith observation by ¢,_, ;, this has n—p—1 terms if i<p or >n—p+l, 
and »—p—2 terms otherwise. The variance of t,_, ; is correspondingly o4/(n—p—1) or 
o*/(n — p — 2) when no correlation exists in the series, and 0A ,/(n —p— 1) or o4A,/(n—p—2) 
if correlation exists. Here, A; will differ for different i, but for large n it will approximate 
to A for all i. 


If there is no serial correlation or if we ignore the differences in A,, the analysis then gives 


a 1 
= (z= >» 
‘na ( o 2(t,- a, ») /( a(t, —l, J 


= (n—2) (n—p)t,,/[n(n— p— 2) + 2p] =t, 
ti, = tne 








M. H. QUENOUILLE 359 


Thus the correction does not affect the estimate. 

If the variation in A, is taken into account, a slightly different estimate is obtained. This 
is still unbiased, but is less efficient (though asymptotically of equal efficiency) probably 
as a consequence of the individual estimators not being fully efficient. For example, the 
products in t,, are correlated with one another, and hence the end products contain informa- 
tion ON £p_»+1%p41, ete. The end-products should thus receive slightly greater weight for 
efficient estimation. Similarly, some of the products in ¢,,_, ; should receive greater weight. 
With these provisions, it appears likely that the above procedure would not lead to any loss 
in efficiency, i.e. the slight loss in efficiency (asymptotically zero) which occurs results from 
the use of inefficient estimates. This obviously is of no practical importance. 

















lds, 
The effectiveness of corrections of this type in the general case is more difficult to prove. 
Their effectiveness might be demonstrated by considering the correction for the mean in 
the estimation of serial covariance. 
If (' - “— 
n—p Z: x; 
“. cc i= ms iil ? 
n the ¥ , 
hove then, when the x; are uncorrelated, 
ering _ n-2p , oF 1 
n be Hltn) = — Gopp™ =~ alt Vn 
rs of 
—_ The expectations of the t,_, will vary. For instance, there will be a few terms of the type 
+ to) n—p n—p 
(Zs) (22) 
| : z 4 i+p = Mais 2 
b R.d<-— = a wa Eg FT 1+0 i ‘ 
vy be ie (n—p—1)? (n—p-—1)? n—1 n? 
that d 
oses. sa = em > 
B( 2, + 2) (2.1 + >. 20) co 1 
a = = sii Lads cecaaeceed Sak 
“ss but the majority (n—4p out of n) will be of the form 
| n—D n—p 
ance H( 2 mi —Xm—p — ty] ( x Vitp — Lm — Lm +i) 
. = i= 
ts in E(tn-1,m) = — (n—p—2) io 
ator | n—2p—3 o (-3)| 
= — ———_~. = ———__ | 1 Ol = . 
ot, (n—p—29” ~~ nil 7 ln 
L) or Thus 
2) BG...) =- -—7~|1+0(4)| ona Bits) = 0 (<3) 
nate rewnal = n—1 n® acl n*} 
as required. 
ives Similar results will hold for the serial correlation coefficients. It is, however, an open 


question as to whether the extra computation involved in calculating and using ¢,_, is 
warranted compared with that involved in calculating and using ¢,,). It appears likely 
that the use of the two half-series is sufficiently accurate for most practical purposes though 
this point requires further investigation. 








360 Notes on bias in estimation 


REFERENCES 


FisHer, R. A. & YaTEs, F. (1953). Statistical Tables for Biological, Agricultural and Medical Research. 


Edinburgh: Oliver and Boyd. 


KeEnDALL, M. G. (1943). The Advanced Theory of Statistics, 1. London: Griffin and Co. 


QUENOUILLE, M. H. (1949). J. R. Statist. Soc. B, 11, 68-84. 


WEATHERBURN, C. E. (1952). Mathematical Statistics. Cambridge University Press. 


APPENDIX 
Proof of a formula in §6 


K(t,) = I 2 nil exp (ae 


= aa ; exp {—4(y—a)*} dy, 








o Jw y\ (2m) 
where 
a=pJnlo, y =xJnI/o. 
= ] 
Let (a) = |" amy exP{- Huadu, 
then © l 
exp (a*) I(a) = i: y\(2m) exp (— 3y*+ay)dy, 
0 ad i 
sq lex (40°) 1(@)] = |" Toa exp(— dy? tay) dy 
= exp (32°). 
Thus 


exp (4a*) I(a) -{ exp (4a?) da, 
0 
the limits being determined by the fact that J(a) = 0 when a = 0. 
Therefore a 
H(a) = exp (— 42) {exp (ha*) da 
0 


n nu’ fH ny? 
Ktt,,) = = exp ( _ oe) [' exp (34) du. 


and 








i a ee ee a ee ee a 61. ieee 





wreh. 





[ 361 ] 


AN INTRODUCTION TO SOME NON-PARAMETRIC 
GENERALIZATIONS OF ANALYSIS OF VARIANCE 
AND MULTIVARIATE ANALYSIS* 


By 8. N. ROY anp 8S. K. MITRA 


Institute of Statistics, University of North Carolina 


It is clear that a p-variate body of data arranged in a q-way classification will formally look like a 
(» + q)-dimensional contingency table, but a distinction can be made between a ‘variate’ and a ‘way 
of classification’ in that along the direction of a ‘variate’ the marginal frequencies are supposed to be 
stochastic variates while along a ‘way of classification’ the marginal frequencies are supposed to be 
fixed. When, along certain directions, the marginals are fixed, an approach based on a conditional 
probability argument has been used. 

In the present paper (i) the conditional probability approach is abandoned and we start either from 
a single multinomial distribution or a product of an appropriate number of different multinomiai 
distributions according as, with multi-way frequency data, all ways are ‘variates’ or some are 
‘variates’ and some are ‘ways of classification’. (ii) Also the hypotheses that are posed are of different 
kinds altogether according as we have a ‘multivariate analysis’ situation or an ‘analysis of variance’ 
situation. The hypotheses that are meaningful for one situation would not be too meaningful for the 
other and vice versa. Since the conditional probability approach is altogether abandoned, the mathe- 
matical theorems to which appeal is made are the two theorems as stated and proved by Cramér 
(1946, chapter 30) and a number of other such theorems which have been proved the same way and 
which, between them, take care of all the hypotheses discussed in this paper. When all ways are 
‘variates’ the hypotheses are analogous to those in the usual multivariate analysis, and when some 
ways are ‘variates’ and some are ‘ways of classification’ the hypotheses are analogous to those in the 
analysis of variance. 

The general methods discussed in this paper arose out of an attempt to analyse a large mass of 
categorical data. The analysis has been carried out, and a few typical cases (together with the numerical 
analysis) illustrating different parts of the theoretical development will be presented in a subsequent 
paper. In this paper only the large-sample tests are considered. How ‘large’ the sample size has to be for 
the validity of the use of these asymptotic techniques, or in other words, some results on the nature 
of the approximation involved will also be discussed in a later paper. 


1. INTRODUCTION 


After this paper had been written the attention of the authors was drawn to the papers by 
Barnard (1947) and Pearson (1947), which the authors have since studied. Although it is 
only the 2x 2 table with which Barnard is exclusively and Pearson is mostly concerned, 
their logical approach to the whole problem is exactly the same as the authors, and they 
have stated the whole position with admirable lucidity. The authors feel that in this general 
area, aside from the pioneering work of Karl Pearson, of Fisher on x? (1922) and of Yule & 
Kendall (1950) on association, the four conspicuous landmarks in recent times must be 
Cramér’s chapter on y? (1946, chapter 30), Neyman’s paper in the same area (1949) and the 
two papers by Barnard and Pearson just referred to. In historical perspective, the present 
paper might, therefore, be regarded as an attempt at a somewhat systematic elaboration 
and application of the earlier ideas of Barnard and Pearson (i) to (mostly) one population 
multivariate analysis, starting from a single multinomial distribution and framing hypo- 
theses appropriate to multivariate analysis situations, and (ii) to analysis of variance, 
starting from the product of an appropriate number of separate multinomial distributions 
and framing hypotheses appropriate to analysis of variance situations. 


* This research was supported in part by the United States Air Force, through the Office of Scientific 
Research of the Air Research and Development Command. 











1 


362 Non-parametric generalizations 
4 


The authors {(presumably for the same reasons as Barnard and Pearson) abandon the 
conditional pr bability approach for a physical and a mathematical reason. The physical 
reason is that keeping the marginals fixed in the sense of conditional probability is seldom 
experimentally realizable, and although the authors are not dogmatic about this point, 
they would prefer to keep their probability statements as close as possible to experimentally 
realizable processes. The mathematical reason is the following. Let n; (t{=1,2,...,7) be 
the observed frequency in the 7th cell and p; be the probability (under any hypothesis) of 
getting an observation in the ith cell, and let the observations be independent in probability. 
Let ~ n, = n be supposed to be fixed, p;>0 and UP = 1. Then the unconditional likeli- 


hood ‘fabetton 4 is n! I pi ‘UT n;!. Next let us dinate by n’ the row vector (n,,...,,) and 


let A bean s xr (s< r) wore es of constants of rank, say t <s, and let us suppose that the n,’s 
are subject to the constraints An = a°, a constant vector. Then the conditional likelihood 
function will be given by 


~= anal], 2 ; = ern 2" (1-1) 


Assume, for simplicity of discussion, that the p,’s are completely specified by the hypothesis 
Tr 
(although this makes no essential difference to the argument). Is }) (n;—np,)?/(np,), in 
i=1 


large samples, distributed as y* with degrees of freedom (7 — 1)—#, no matter what A might 
be, provided that it is of rank ¢? If the answer to this question is yes, then the customary 
linear estimation or testing of linear hypotheses (in the sense of least squares and analysis 
of variance) can be carried out, starting from (1-1) and eventually using the x?-test. One of 
the authors (Mitra) has constructed an A matrix such that under a ¢ (of 1-1) with that A 
matrix, the limiting distribution of ¥ (n,;—np;,)*/(np,) is not a x°-distribution. This, of 


course, will not affect the validity of the applications, mostly simple, actually made over the 
last fifty years starting from (1-1), because, in all these applications, A is such that we have 
the y?-distribution. But it would be mathematically unsafe to try to set up a genera! theory 
of testing of linear hypotheses, starting from (1-1), to say nothing of its being physically 
unsatisfactory. We do not know under what restrictions on A the distribution will be x?, 
but the authors have shown that at least those A’s are permissible under which, instead of 
starting from (1-1), we could also have abandoned the conditional probability approach and 
started from the product of a number of different multinomial distributions. The invalidating 
example will be discussed in a later paper. 


2. PROBLEMS IN A TWO-WAY TABLE 
To fix our ideas, consider first a two-way, say r x8, table with observed frequencies n,; 
in the (ij)th cell (¢= 1, 2,...,7; 7=1, 2,...,8). Also let 


VNiy = Noy, UMN; = Nip and - Nz = Nyy =n (say). 
i I 


2-1. Both ‘i’ and ‘j’ are ‘variates’ 


Assume that we have a sample of n independent observations such that p;; is 
the probability of an observation in the (ij)th cell, and m is fixed from sample to 





Sé 


— > - WD 


~ «- «- FF DD 





2S Ni; 


Dis is 
le to 





S. N. Roy anp S. K. Mirra 363 


sample. Also let } p;; = Pio, S Pij = Poy» X Pij = Poo = 1. Then the likelihood function 
j i i,j 
is given by 


n' 
Tn IT pi (2-1) 


o= 
The composite hypothesis we shali be interested in testing is that ‘i’ and ‘j’ are independent, 
that is, Hy: pi; = PipPo; against H +H), where the pj's and py;’s are arbitrary positive 
nuisance parameters subject to } pig = } Mo; = 1. This is the analogue of the hypothesis 
i j 


of no correlation in a bivariate normal population. Under H, we shall have the likelihood 
function ¢, given by 
n! 
po = Tina! tl (PioPo;)"" = ae i I Pos’ (2-1-1) 
1j° U; 


iJ i,j 





2:2. ‘i’ is a ‘way of classification’ and ‘j’ is a ‘variate’ 
Assume that we have r independent sets of sizes 74, N99, ..., M9 Of independent observa- 
tions such that nj) (1=1, 2, ...,7) is fixed from sample to sample and p,; is the probability 


of an observation in the (ig)th a Also we notice that ~ Piz = Pio = 1. Then the likelihood 
function is given by 





é= Ulta, 





The composite hypothesis we shall be interested in testing is that p;;, for any j, is independent 

of ‘i’, or in other words, A: p;; = Yo; (say) against H + H), where theq),’s are arbitrary positive 

nuisance parameters subject to ¥ qo; = Pi; = Pio = 1. This is the analogue of the hypothesis 
j j 


of the equality of means for r homoscedastic univariate normal populations. Under H, we 
shall have 
iH Nig 


IT Nig: 





$o = Il Miers I] aay| = < os 1 qo} (2-2-1) 


Te 


This ¢) could also have been obtained from the ¢ of (2-1), by putting Ay: p;; = pyypo; and 
then finding under A, the conditional probability subject to the ,,’s being fixed. But it seems 
to the authors that physically this is far less realistic than the model here used, although 
historically this is more or less what has been used so far. 

The case of ‘i’ being a ‘variate’ and ‘j’ a ‘way of classification’ is exactly similar and 
need not be separately considered. 


2:3. Both ‘i’ and ‘7’ are ‘ways of classification’ 


Here we have a sampling scheme in which the n,9's and np,’s (t= 1, 2, ...,7;7 =1, 2, ...,8) are 
supposed to be fixed from sample to sample. In this situation, on the hypothesis of ledepenll 
ence between ‘i’ and ‘j’, we can write down the likelihood function ¢) without assuming 
that the observations are independent. For this we start from an urn problem model in 
which there is an urn containing 149, %g9, -.-, Mp» balls of r different colours from which we 
draw successively without replacement 191, %p, ...,%, balls (with Y jo = Y mg; = n). 

t j 











364 Non-parametric generalizations 


The joint probability that the jth set of n); balls will contain ,;, v9;, ...,,; balls of different 
colours (with j = 1, 2,...,s) will be given by 


Po = IT Mo! T1%o,!/("! TT 24;!)- (2-3) 
v J U9 


The great advantage of this scheme is that the different observations need not be assumed 
to be independent, and the great disadvantage is that it is not clear what is the form of ¢ 
under a general H as distinct from the null hypothesis H, of independence between ‘i’ and ‘j’ 
This means that the power of a test for H, against alternatives cannot be obtained and also 
that is is not possible to obtain a one-tailed x?-test for H, using the same kind of heuristic 
arguments that we shall use in the first two situations. A one-tailed x?-test can be found here 
by analogy with what is done in the first two cases. 

This ¢, could also have been obtained from the ¢ of (2-1), by putting Hy: p;; = pio Po; and 
then finding under H, the conditional probability subject to the n,9’s and n9,’s being fixed. But 
this would be less realistic than the model here used and would deprive (2-3) of the one great 
advantage it possesses in that the successive observations do not have to be independent. 
Notice that (2-1) is based on the observations being all independent. 

It will be seen that the approach here is not one of conditional probability at all. It will 
also be seen that there are three different sampling schemes each leading in a natural way 
to a particular probability model and a particular type of hypothesis to be tested. From a 
physical standpoint it would not be proper to break this tie and use a particular probability 
model and test a particular type of hypothesis when the sampling scheme is something 
different. It will be noticed that in most situations of life the natural sampling schemes are 
those of (i) or (ii), but there are situations, e.g. Fisher’s tea-tasting experiments or those 
connected with the extra-sensory perception experiments or with the claims of astrologers 
as to prediction, etc., where (iii) might be a natural sampling scheme. 


3. PROBLEMS IN A THREE-WAY TABLE 
As a natural extension of a two-way table consider a three-way r x s x t table with observed 
frequencies Miz in the (ijk)th cell (¢=1, 2,...,7; j=1, 2, ...,8; K=1, 2,...,¢). Also let 
Pice = Nojk UMisk = Nox ~ Nigk = Nijo» 2 Mik = Nook: 


1D ign =Nojo, ULNizn = Nino LD Mijn = No =% (Say). 
1 ik i,k i,3, ke 


oa. 78"; “F and ‘k’ all ‘variates’ 
Assume that we have a sample of n independent observations such that p,;, is the pro- 
bability of an joe in the (ijk)th cell and n is fixed from sample to sample. Also let 


12 Pin = Pojrs Li Pijx = Pior > Pisk = Pijo> LX Pik = Poor: 
en I UJ 


a 2 Pisk = Pojo> 2 Pijk = = Pioo> & Pisk = Pow = |. 
U9, 
The likelinoba function will be given by 
hy 9 = 
t 
In this case o well ' shall be interested in testing a class of composite hypotheses, a typical one 
being the compete 


| 
_ 
| 


—~— Tl pee (3-1) 
JL. aaa i,j,k a 
i,j,k 


4 
f 


=e. 
~eo™- 











“es 


Th 


ne ~~. 





rved 


pro- 
o let 


(3-1) 


| one 








S. N. Roy anp S. K. Mirra 365 


3-la. Hypothesis of conditional independence between ‘i’ and ‘}j’ for fixed ‘k’ 
This will be 
Pub — Prox Post Or Pix = Puree (3-1-1) 
Pook Poor Pook Pook 
against H +H, (¢=1, ...,7;j=1, ...,8; =1, ..., é). 
This is the analogue of the hypothesis of no partial correlation between x and y, given z, 
in a three-variate normal population. It is easy to see that if we superimpose on this the 
composite hypothesis of dependence between ‘i’ and ‘k’, and between ‘j’ and ‘k’, ie 


Piok = PiooPoor 2924 Poj~ = PojoPook> (3-1-2) 
which is the analogue of the hypothesis of no total correlation between x and z, and between 
y and z, in a three-variate normal population, we should have 


Pik = PiooPojoPook> (3-1-3) 
which is the condition. of complete independence of ‘i’, ‘j’ and ‘k’. 
We shall also be interested in another class of committe hypotheses, the typical one 
being the composite 


3:1b. Hypothesis of independence between ‘(i,j)’ and ‘k’ 
This will be 
Ay: Pisk = PijoPoor 2Gainst H+H, (t=1,...,7; j=1,...,8; k=1,...,#). (3-1-4) 


This is the analogue of the hypothesis of no multiple correlation between (x,y) and z in 
a three-variate normal population. 

It is easy to check by summing the two sides of the above equation over ¢ and also over j 
separately that (3-1-4) implies the composite hypotheses i 


Piok = PiooPoor 8A Pos = PojoPook- (3-1-5) 
But (3-1-5) will not imply (3-1-4). It has been shown by Roy & Kastenbaum (1956) that 
it is necessary to superimpose on (3-1-5) the additional hypothesis 
Fy: Pig = VejoVior Vopr? (3-1-6) 
to obtain (3-1-4), where 459, Yioxs Jojx ate defined to be arbitrary positive functions of (7,7), 


(i,k) and (j,k) with no summation convention connecting them as in the case of the p;;,’s. 
By analogy with analysis of variance H, will be called the hypothesis of ‘no interaction’. 


3-2. ‘i’ and ‘i’ are ‘variates’ and ‘k’ a ‘way of classification’ 

Assume that we have ¢ independent sets of sizes 1991, ..., Nog, Of independent observations 
such that No, (k=1, ...,¢) is fixed from sample to sample and p,;, is the probability of an 
observation in the (igk)th cell. Notice that » | Pisk = Poor = 1. The likelihood function will 
be given by 


! 
¢=T] ite M1 vet]. (3-2) 


Here we shall be interested in testing the composite 
3-2a. Hypothesis of independence between ‘i’ and ‘j’ for each ‘k’, that is, 
Ay: Pijx = Piox Pose against H+H, (i=1,...,7; J=1,...,8; K=1,...,t). (8-2-1) 













i Hi _ Non-parametric generalizations 


If we super} n Nye pese on this the composite hypothesis that the marginal ‘i’ (obtained by 
summing over bs) ) is independent of ‘k’ and similarly for ‘j’, that is, 


4 , ? 


' Piok = Yoo (SAY) and Pox = Jojo (Say), (3-2-2) 
we should ha 


ai | Pisk = Lio0Iojo- (3-2-3) 
Notice from (<Ap} 2) that 2 Fioo = = = Y Pior = Poor = 1 and also that 2 doo = = Posi = Dox = 1. 
We shall alg be ‘ehdoeniod 1 in ‘he composite 


: 
hg 


3-26. H anitiniie that p;;, is independent of ‘k’, that is, 























Ay: Pix = Uyjo (Say) against H+ H, (for all 7, j and k). (3-2-4) 


This is the analogue of the hypothesis of the equality of ¢ mean vectors (each consisting 
of two components) for ¢ bivariate normal populations, each having the same variance- 
covariance matrix. 

Summing over ‘j’ and ‘2’ separately this would imply 

Piox = 2 Tio = io (Say) and Pox, = XIijo = Jojo (Say). (3-2-5) 

t 
As in the case where ‘i’, ‘j’ and ‘k’ are all ‘variates’, so also here, (3-2-4) implies (3-2-5) 
but (3-2-5) does not nase (3-2-4). Exactly in the same way as shown by Roy & Kastenbaum 
(1956) it can be shown that the extra condition which, when superimposed on (3-2-5), will 


imply (3-2-4) is 
me , Pigk = VijoVojn Lick: (3-2-6) 


3°3. ‘i’ is a ‘variate’ and ‘j’ and ‘k’ are ‘ways of classification’ 
Assume that we have s x ¢t independent sets of sizes n9,,, of independent observations such 
that no, (j=1,...,8; k=1,...,¢) is fixed from sample to sample and p,;, is the probability 


of an cheervation 3 in the (ijk)th cell. Notice that ~ Pisk = Pox = 1. The likelihood function 
will be given by 





$= Tit Mey (3:3) 


Here we shall be interested in the composite 


3:3a. Hypothesis that for any ‘k’, p;;, is independent of ‘j’, that is, 
Ay: Pijk = Yon (Say) against H+H, (for all i, j and k). (3-3-1) 
Notice that 2 Giox = ~ Pijx = Poy = 1. 
We shall be also interested in the other composite 
3°36. Hypothesis that for any ‘j’, pj; , 1s independent of ‘k’, that is, 
Ay: Pijx = Vjo (Say), against H+H, (for all i, j and k). (3-3-2) 
Notice that ¥ jo = X Pisx = Pose = 1. We now observe that (3-3-1) together with (3-3-2) 
implies that ree isa we function of ‘i’, i.e. that 


Pik = Vino (Say) (for all i, j and k). (3-3-3) 














3-1) 


3-2) 
3-2) 


3-3) 








S. N. Roy anp S. K. Mirra 367 


If, in a one-way classification in the usual analysis of variance, ‘i’ corresponds to the 
‘variate’, ‘7’ to the ‘concomitant variate’ and ‘k’ to the ‘way of classification’, then it will 
be seen on a little reflexion that (3-2-1) will be the analogue of the hypothesis of no regression 
and (3-2-4) will be the analogue of the hypothesis of no covariance. On the other hand, 
suppose we take ‘j’ and ‘k’ as just two ‘ways of classification’, for example, if we take ‘7’ 
as, say, blocks and ‘k’ as, say, treatments in a randomized block experiment (with more 
than one and in general unequal number of replications in each cell). Then (3-3-1) will be 
the analogue of ‘no block effect’ for each treatment separately and (3-3-2) will be the 
analogue of ‘no treatment effect’ for each block separately. In other words, in the usual 
parlance of analysis of variance, (3-3-1) combines the hypothesis of ‘no main effect’ and 
‘no interaction’, while (3-3-2) combines the hypotheses of another ‘no main effect’ and ‘no 
interaction’. In the analysis of variance situations for data which are not of the ‘normal 
variate’ type, the authors believe that this would be a better way of handling the material 
than the one used by Roy & Kastenbaum. Even for ‘normal variate’ type of data in the 
analysis of variance situations the authors are not sure that from the physical standpoint 
this might not be a better approach than the customary one of analysis into main effects 
and interactions of various orders. It is hoped to consider this in a later paper. 


3-4. ‘a’ is a variate and ‘j’ and ‘k’ are ‘ways of classification’ in the sense of a ‘balanced 
incomplete’ or ‘partially balanced incomplete’ or a more general type of ‘incomplete’ block 
experiment 


Assume as before that there are r ‘i’’s, s ‘j’’s and ¢ ‘k’’s. Assume further that ‘7° is a 
block and ‘k’ a treatment and that, for any ‘j’, there is a set of treatments (7'); associated 
with it, of number ¢;. In other words, for a given j, k takes on the set of values (¢);, where 


8 
(t); isaset of indices of number t; out of 1, 2, ...,¢. Now assume that wehave > ¢,; independent 
j=1 


sets of sizes »,,, of independent observations such that n), (ke (t);; j=1, 2, ...,8) is fixed 
from sample to sample and p;,, is the probability of an observation in the (ijk)th cell. As 
before > p;;, = Po, = 1. The likelihood function will be given by 

i 





= ogre! i: 3-4 

é fl Jl, oo i ieee! I (3-4) 
a 

We can take over the hypothesis (3-3-1) of ‘no block effect for each treatment separately’ 

and (3-3-2) of ‘no treatment effect for each block separately’. For a ‘balanced incomplete 

design’ all the t,’s will be equal and there will be a highly symmetrical pattern while for a 

‘partially balanced design’ all the ¢,’s will be equal but there will be a less symmetrical 

pattern. 


3:5. ‘4’ is a ‘way of classification’ and ‘j,k’ also are ‘ways of classification’ in the sense that 
the Ning’ 8 and No;,'8 are fixed from sample to sample 


We can write down gy in this case in exactly the same way as we wrote down the ¢, in 
§ 2-3. On the hypothesis of independence between ‘i’ and ‘(j, &)’, this will be 


$o = II %o! IT Nojn!/(n! II Nizn!)- (3-5) 
i j,k tik 


Starting from this we can test the hypothesis of ‘adependence between ‘i’ and ‘j,k’. 











368 _ Non-parametric generalizations 


The case of ‘i’, ‘j’ and ‘k’ being ‘ways of classification’ in the sense of the 99's, Nj) 8 and 
Noox 8 (but not the %,,,’s) being fixed from sample to sample is also of some interest, but 
we shall not consider that case in the present paper. 

The extension of the problems of the two-way tables of §2 to those of the three-way 
tables of $3 is a rather big conceptual jump, but the extension from three-way tables to 
those of higher dimensions involves no such jump and will not be discussed in this paper 
except for some remarks towards the end. 


4, THE DERIVATION OF THE x TEST ON THE UNION-INTERSECTION 
PRINCIPLE 


Let a random sample of size n from some population be classified into k (<n) mutually 
exclusive and exhaustive categories according to some observable characteristics (qualita- 
tive or quantitative), and let the probability of a random observation falling in the ith 


k 
category be p; with p;>Oand > p; = 1. Let n; denote the observed frequency in the ith 


i=1 
category with, of course, }n,; =n. Also let n’ = (4,79, ...,n,) and p’ = (py, Po, ---, Px): 
i 


We have now ledniis n! 
P(n’|p’] = 





e, 4 
IT »,! i Pi (4) 
4-1. A simple hypothesis H,: p’ = py against the composite alternative H: p’ + py 
Consider first the most powerful test at a level say /,, of H): p’ = py against a specific 
P+ P,, which, by the Neyman—Pearson lemma, will be as follows: 
Reject H, if ‘In la 
oe Pin’ | pi]/P[n’ | po] > (4-1-1) 


and accept H, otherwise, where, given , the size of the critical region (4-1-1) under pj 
should be A(, pg, pj, 2). Substituting in (4-1-1) from (4) and taking logarithms on both sides, 
we see after a little simplification that (4-1-1) becomes 


a’(Py)(n=mpo) , logu—na'(P,) Po _ iy a 
Jfa’(p,) Ma(p,)} > y{a'(p,) Ma(p,)} PPP) $Y: (4-1-2) 





where a’(p,) = [log (p11/P10), log (Pe1/Po9), ---, log (Pxr/Pxo)], and A° = (o%;) is the variance 

covariance matrix of 1, 9, ...,m, under H,: p’ = py and where /,, is supposed to vary with, 
that is depend upon, p,. It is thus evident that, for a fixed c, the critical region 

, , a’(p,) [n—nPpp] | 

wp,,c) = {2 : =r ee} 4-1-3 

(Pi) = 1 a’(p,) A°a(p,)) sk 

is the most powerful critical region for testing p’ = pj against a specific p = p, (+p) at 

a level of significance f(p}j,c,n). Since the composite H: p’ + pj is the union of all pj + po, 

we use the union-intersection principle (Roy, 1953) and take for H, against the composite 

H the critical region P 

6 w(c) = Upp, 20(P’, ¢)- (4-1-4) 


Thus we should have for the complement of w(c) 


: a’(p)[n—2po] _ | i‘ 
(a “eu /fa’(p) A°a(p)} <°)" ia 





Sil 


It 





sand 
but 


‘way 
2s to 
aper 


lally 
lita- 
> ith 


e ith 


| Px): 


+ 1-5) 





S. N. Roy anp S. K. Mrrra 369 
Since Yn, = n and > pip = 1, we can write 


a’(p) [n—npp] _ bp i nie (4-1-6) 
Via’(p) A°a(p)} ~ /{b’(p) AZ, b(p)}’ 
where n*’ = (11,9, ..-, M1), Po = = Poo» «++» Pe-1,0) 
b‘(p) = (6,(P), 62(P), --., 0. -1(P)), 

b(P) = 4,(P)—4,(P) = log (p;Pxo/P.Pio) (C= 1, ...,4—1) 
and A%.,, is the matrix formed by omitting the kth row and the kth column of A°. Notice 
that each b,(p) can assume any value on the real line and conversely, given any real vector 
bo = (O49; ---» 541,09), the equations b’(p) = bj have always a unique solution in p; thus 





Pil Px = (Piol Pro) e® = Ay, say, 
k-1 
or p=Azgy (14 a iv) ((=1,2,...,k-1) and pe=1f(i+ ¥ Aa) 
i / 1 
Hence we have (Roy, 1953) 


a’(p)[n—npy] b'(p ae 
yy; V{a'(p) A°a(p)}—“e? /{b’(p)A%,,b(p)} 


We next observe that 


= +{[n* — np)’ Ad; [n* — npz]}?. 


TH =—NPio Pj if i+j and Tf = NPio(1 — Pjo)- 
It is easy to check (Roy & Sarhan, 1956) that if (Aj,,)-! = (a,;), thena;; = 1/(npyo) ift +7 and 
a; = 1/(npjo) -- 1/(npz9). We have, therefore, 


a’(p) [n— Pol . (mi: — at 3 
op apy ieaph Lae 


From (4-1-5) we see that (4-1-4) becomes 


w(c)= in’: +[ (n;— Pio) “]'>el. (4-1-7) 
i=1 NPio 

Since the left-hand side of the inequality in (4-1-7) is essentially non-negative it is easy to 
see that we obtain a non-trivial solution only when c>0. It is thus seen that the y? critical 
region is obtained by using the union-intersection with respect to variation over the alter- 
natives p’ (+ pj), keeping fixed a quantity c defined (in terms of p; and n) by the right-hand 
side of (4-1-2). This means of course letting # vary with p in an appropriate manner. Now 
ifn is large, from the asymptotic normality of the left-hand side of the inequality in (4-1-3), 
we have, as n—>00, 


B(pi,¢,n)> Al et dt. (4-1-8) 
ec 


V(27) 

In large samples it is thus seen that keeping c fixed means making / the same for all p,’s, 
which means that in large samples the x? critical region (4-1-7) is a union-intersection critical 
region of type I (Roy, 1953). For large n,’s (the approximation would be good enough even 
for moderately large values of the n,’s) it is well known that the left-hand side of the 
inequality in (4-1-7) is asymptotically distributed as y? with (k— 1) degreesof freedom. Fora 
satisfactory proof see Cramér (1946, chapter 30). 

24 Biom. 43 











370 - Non-parametric generalizations 


4-2. Test of a composite hypothesis against a composite alternative 


Suppose that the composite null hypothesis is given by 


Ay: {pi = Pil, 9, --. 9,)}0, ates Oy) €Q? 


where p,(9,, ...,9,) are k known functions of r (<k) unknown parameters. The hypothesis 

does not specify the values of the parameters except that they belong to a certain para- 

metric space 2. The (composite) alternative is H,+H,. For any given (09, 68, ...,0°), we 

obtain, as in the previous section, a heuristic test of the hypothesis Hj: {p; = p,(6, ..., 4%} 

against H+ H4, which has a critical region 
02, ..., 0%] 

w(c, 09, ..., 0° sy me MPA, «++» Fe) oa 4-2] 

( ihe ‘g=10—_ hp (RR, ..., F) ( ) 

This critical region is the region of rejection of Hj: {p; = p,(O, ...,0%)} for a specific 
(02, ..., OP). Now to reject Hy: {p; = p;(9,, ..-,9,)}0,, ...,0) eq Would be to reject 


Hy: pi = pO, ..., 92) for every (09, ..., 2) Q, 


and thus using the union-intersection principle for the second time we have for H, the 
critical region 


k [n,—np,(6,, ..- 2 
=({n: inf realy al, (4-2-2) 


(0;,...,O)EQ i=1 ap (A,, ooeg Oy) 


which is precisely the minimum 4? critical region. The equations giving the 0,’s in terms of 
the n,’s, in the form for minimum y? are 


et ee (id, S, ..:#). (4-2-3) 


It has been shown by Cramér that for large n,’s the equations (4-2-3) can be replaced by the 
maximum-likelihood equations which are simpler to use, the likelihood function being 


po = TT p7'(9,, ---,9,)- (4:2-4) 


nh 


The maximum-likelihood equations can be put in the form 





2 s mn, Op, _ << m,—np,op;, |. 
9,08 Po = = np,d0, x aD, 20, (j=1,2,...,7). (4-2-5) 


5. LARGE-SAMPLE TESTS OF THE HYPOTHESES IN §2 
Assuming that };; = no; (fixed) and » p,; = po; = 1 (where ¢ itself may be a multiple 
i i 
subscript like 7,7,...%, and 7 may be a multiple subscript like j,j,...j,) and using the 
traditional approach of conditional probability, we have, under any hypothesis which 


expresses the p,,’s in terms of a lesser number of free or nuisance parameters, the result that, 
as n->0o subject to ny,/n being held constant, > (n,;— 9; ;;)"/(%o;P,;) tends to have the 
i,j 








ecific 


1, the 


(4-2:5) 


ultiple 


ng the 
which 
t that, 
ve the 





S. N. Roy ann S. K. Mrrra 371 


x?-distribution with degrees of freedom equal to the numbers of cells minus the number of 
linear constraints on the n;;’s (which we shall here replace by the number of separate 
multinomial distributions) minus the number of independent parameters that are estimated 
from the data. Notice that the #,,’s are the maximum-likelihood estimates of the p,,’s 
obtained by expressing the p;;’s in terms of the lesser number of free parameters from the 
hypothesis, then substituting in ¢, and then maximizing the ¢ with respect to these free or 
nuisance parameters. However, here we do not rely on the traditional approach but upon 
a proof of the above theorem which starts from 

Noj ! ni 
n2 ? FE 13! 7 Pi 


7 





and proceeds along the lines followed by Cramér. This, among other theorems, is given in 
a paper to be shortly submitted to Biometrika. 


5-1. The problem of § 2-1 
We consider § 2-1, start from (2-1-1), maximize log ¢, with respect to the pj,o’s and py,’s 
subject to ¥ pio = / Po; = 1 (using Lagrangian multipliers) and obtain the maximum- 
i j 


likelihood solutions: Pjyp=njo/n and Py; = Nq;/n. The number of independent parameters 
estimated from the data is r+s—2, and hence by §5, the test of independence here is 
based on a statistic which has the y?-distribution with degrees of freedom 


rs—1—(r+s—2) = (r—1)(s—1) 
and whose form is 





( a os *oi)! ( —"wtes) 
bad nn “y n 
>» = ; (5-1-1) 
i,j Nig Noj ij Nig Xo; 
nn n 


5-2. The problem of § 2-2 
We start from (2-2-1) and maximize log ¢, with respect to qo;'s subject to Yq»; = 1 and 


3 
obtain the maximum-likelihood solutions: G9; = 9;/n. The number of independent para- 
meters estimated from the data is s—1, and hence by §5 the test here is to be based on 
a statistic which has the x?-distribution with degrees of freedom 


r(s—1)—(s—1) = (r—1) (s—1) 
and whose form is 





5:3. The problem of § 2-3 


We start from (2-3). Here we have already, under the null hypothesis, p;; = njg% ;/n?, 
and using the remarks of § 5 we note that the test is based on a statistic having the y?-dis- 
tribution with degrees of freedom rs—(r+s—1) = (r—1)(s—1) and the form 


\2 Jn; ’ 
5: (ny) [rare (5-3-1) 
ij n n 





24-2 











372 - Non-parametric generalizations 


6. LARGE-SAMPLE TESTS OF THE HYPOTHESES IN §3 
6-1. The problems of § 3-1, i.e. where ‘i’, ‘j’ and ‘k’ are all ‘variates’ 


6-la. The problem of § 3-1a 
Independence between ‘i’ and ‘j’ for fixed ‘k’. Under H, of (3-1-1) we shall have 


Pom TT (Piox Posx! Poor) ™- (6-1-1) 
1,9, 


To test the hypothesis here we maximize log ¢, with respect to the pjoj.’8, Pojj,’8 ANA Poo,,'8 
subject to } pio, = X Pojx = Poor 2Md S Poo, = 1, and obtain the maximum-likelihood 
i j k 


solutions Pio, = Miox/N, Pojy = Nojx/% ANA Pooy = Noo%/”%. The number of independent para- 
meters estimated from this data is (r—1)t+(s—1)t+(t—1). And hence by §5 the test of 
conditional independence is here based on a statistic which has the x?-distribution with 
degrees of freedom rst — 1 —t(r — 1) —#(s — 1) —(t- 1) = t(r — 1) (s— 1) and whose form is 





¥ (, _ “40k ut) / i0k Ojk 6-1-2 
2 «a Nook Nook { 
Independence between ‘i’ and ‘k’ and aiso between ‘j’ and ‘k’. This can be handled exactly 
on the lines of § 5 and will not be discussed separately. 
Independence between ‘i’, ‘j’ and ‘k’. To test this we start from the hypothesis of (3-1-3) 
giving ; 
pox IL (PiooPojo Poor)" (6-1-3) 
U5), 
maximize log $y with respect to pjo9’8, Pojp' 8 ANA Pog,’ 8 SUbject to Y Ping = X Pojo = UX Poor = 1 
i j k 


and obtain the maximum-likelihood solutions: Pjoq = Njo9/%, Pojo = ojo/% ANA Poo, = Noox/”- 
The number of independent parameters estimated from the data is (r+ s+t—3), and hence 
by §5 the test is here based on a statistic which has the ?-distribution with degrees of 
freedom rst — 1—(r+s+t—3) = rst—r—s—t+2, and whose form is 


— _ in0™ojo™ ook ' N00 ojo 00x 6-1-4 
Ss (rs n® ) / sai hia 
6-1b. The problems of §3-1b 
Independence between ‘(i,j)’ and ‘k’. Under H, of (3-1-4) we shall have 
PoX II (PisoPoox)”™- (6-1-5) 
i,j,k 

To tesi this hypothesis we maximize log ¢, with respect to pjjo's and Ppoq;,’8 subject to 
> Piso = = Pook =1 and obtain the maximum-likelihood solutions: Pj, = njj9/n and 
J c 


Pook = %oox/%- The number of independent parameters estimated from the data is 
(rs — 1) + (¢—1) and hence by § 5 the test is based on a statistic having the y?-distribution 
with degrees of freedom rst — 1 —[(rs— 1) +(t—1)] = (rs— 1) (t— 1) and having the form 


2 
NiinNook\~ [NiioM 
> (ny — Moment) [soteee, (6-1-6) 
i,j,k n n 








S. N. Roy anp S. K. Mirra 373 


Independence between ‘i’ and ‘k’ and between ‘j’ and ‘k’. Since this can be handled on the 
same lines as in § 5, it will not be separately discussed. 

The ‘no interaction’ hypothesis of (3-1-6). This has been discussed in detail in another paper 
by Roy & Kastenbaum (1956) and will not be discussed here. The test will be based on a 
statistic having the y*-distribution with degrees of freedom (r— 1) (s—1)(¢—1) and having 
a rather complicated form (see Bartlett, 1935; Norton, 1945; Roy & Kastenbaum, 1956). 


6-2. The problems of § 3-2, i.e. when ‘i’ and ‘j’ are ‘variates’ and ‘k’ is a ‘way of classification’ 
6-2a. The problems of §3-2a 
Independence between ‘i’ and ‘j’ for each k. Under H, of (3-2-1) we start from 


PoX it ( Piox Posr)”™ (6-2-1) 
i,j, 


and maximize log ¢, with respect to the pjo,’s and po;,.’s subject to Y pio, = S Pose = Poor = 1, 
; 


7 
and obtain the maximum-likelihood solutions: Pio, = Nioz/No0%, Pojx = Nojr/ Moon: The 
number of independent parameters to be estimated from these data is t(r—1)+¢(s—1), 
and hence the test here is to be based on a statistic having the y?-distribution with degrees 
of freedom t(rs — 1) —t(r — 1) —t(s—1) = t(r—1)(s—1) and having the form 


Niokojn\" Niok Nok 
x |= (ns — ook — id | (moe “aera . (6-2-2) 
k i,j Nook Nook / 

The problems under (3-2-2) or (3-2-3) will not be discussed separately. 
6-2b. The problems of §3-26 

The hypothesis that p;;,, is independent of ‘k’. Under H, of (3-2-4) we start from 

oX TI aij. (6-2-3) 
i,j,k 
maximize log ¢y with respect to the q;;)'s subject to } ¢;;9 = 1, and obtain the maximum- 
i,j 


likelihood solutions: 9;;9 = ;;9/n- The number of independent parameters to be estimated 
from the data is (rs—1), and hence by §5 the test is to be based on a statistic having the 
y?-distribution with degrees of freedom ¢(rs—1)—(rs—1) = (rs—1)(¢—1) and having the 


form ‘ s 
~ Niio Nijo eed 
= LE (vane) (oma) ap 


The problems under (3-2-5) or (3-2-6) will not be separately discussed here. 





6-3. The problems of §3-3, i.e. when ‘i’ is a ‘variate’ and ‘j’ and ‘k’ are ‘ways of classification’ 
6-3a. The problem of §3-3a 
The hypothesis that for any ‘k’, p;;,, is independent of ‘j’. Under H, of (3-3-1) we start from 
Pox II qiok (6-3-1) 
i,j,k 


and maximize log ¢, with respect to the q;o),,’s subject to ¥ qo, = 1, and obtain the maximum- 
i 


likelihood solutions @ 9, = %o%/%o0x- he number of independent parameters to be estimated 











374 Non-parametric generalizations 


from the data is t(r—1), and hence by §5 the test is to be based on a statistic having the 
x?-distribution with degrees of freedom st(r—1)—t(r—1) = t(r—1)(s—1) and having the 


form 
E/E (ngp— 29; Niox\” = “as (6-3-2) 
i,kLi alleen: Nook - Noon) 1 


6-36. The problem of §3-36 


This will be exactly on the same lines as the previous case and will not be discussed 
separately. We shall also omit a discussion of the problem under (3-3-3). 








6-4. The problems of § 3-4, i.e. when ‘i’is a variate and ‘j’ and “k’ are ways of classification in 
the sense of an incomplete design 
The hypothesis that p;;,, is independent of ‘j’. We start from (3-4), put p;, = qjo, and thus 
have ¢,0c J] g%. On maximizing log ¢, with respect to the q;o,’s subject to ¥ q;o;, = 1, 
‘i,k i 


we obtain a solution for the qj,,’s in terms of the %;;,’s which is a rather complicated set 
of functions of n,;,’s of the same structure as the corresponding least-squares solutions in 
linear estimation. One or two such solutions for some linked block designs will be discussed 
in a later paper. However, this solution, inserted in the y? functions, will have the y?- 
8 
distribution with degrees of freedom (r—1) ¥ t;—(rt—1). 
j=1 
The hypothesis that p;, is independent of ‘j’, can be handled on exactly similar lines and 
need not be separately considered. 


7. CONCLUDING REMARKS 


The extension from three to more dimensions does not present any essentially new pro- 
blems. One or two additional features of interest of such extensions will be discussed later. 

The authors believe that this is an attempt at a somewhat systematic exposition 
(i) which is based on a clear distinction between a ‘variate’ and a ‘way of classification’, that 
stems from differing experimental situations and sampling schemes, (ii) which sets up 
different probability models for the different situations, and (iii) which poses different types 
of hypothesis according as it is a ‘multivariate analysis’ situation or an ‘analysis of variance’ 
situation or something of a mixed type. Every type of experimental situation has its own 
appropriate sampling scheme and appropriate probability model, and the authors believe 
that it would not be proper to force some kind of sampling scheme and probability model 
on the wrong kind of experimental situation and also to pose one kind of hypothesis on the 
wrong kind of sampling scheme and probability model. 

For two analogous null hypotheses under two different probability models we have 
eventually the same y? with the same distribution under the respective null hypotheses, 
But the powers of these tests, that is, the distributions of these statistics under the respective 
non-null hypotheses, are entirely different. In large samples these powers would of course, 
in the usual sense, tend to 1 in each case. But the asymptotic powers (in the sense of Pitman 
and Lehmann, which is perhaps the only sense that is relevant here) can be obtained and 
compared. They have been obtained; they are different and will be given a later paper. 

In many situations in which a particular hypothesis H, with an associated 4? is the inter- 
section of several hypothesis H,, Hy:,..., with associated y?, y3,..., it so happens that 





the 
the 


and 





S. N. Roy anp 8. K. Mirra 375 


x? = Xi+xi+.-.and yj, V3, ... are also independently distributed, but the additivity is notin 
the usual algebraic sense; it is only in probability and asymptotically as noo, and the in- 
dependence is also in the asymptotic sense. Take, fur example, the hypotheses (3-1-1), (3-1-2) 
and (3-1-3) and let us call them H),, Hy, and Hys,and H,. We note that H, = Hy, N Ay, ys. 
Let the associated y’s be denoted by y3, x3, v3 and x”. Then, in this case it has been shown 
that, in large samples and under the null hypothesis, .?, v3 and y3 are independently dis- 
tributed and yj + x3 + V3—> xX” in probability. We have an exactly similar situation for the 
group of hypotheses (3-1-4), (3-1-5) and (3-1-6). These are situations in multivariate analysis. 
There are similar situations in analysis of variance also, for example, with the group of 
hypotheses (3-2-4), (3-2-5) and (3-2-6) or with the group (3-3-1), (3-3-2) and (3-3-3). But this 
will not be true, for example, with a similar group of hypotheses on an incomplete block 
design or general types of designs indicated in §3-4. A proof of ‘independence’ and 
‘additivity’ covering the problems of this paper and some other problems (where ‘indepen- 
dence’ and ‘additivity’ exist) will be given in a later paper. 

We have defined the hypothesis of ‘no interaction’ between ‘7’ and ‘j’ when ‘i’, ‘7’ and 
‘k’ are all ‘variates’ or ‘i’ and ‘j’ are variates and ‘k’ is a way of classification. It is a kind 
of bridge over the gap between (a) the hypotheses of independence of ‘i’ and ‘k’, ‘j’ and ‘k’, 
and (b) the hypotheses of independence of ‘(7,7)’ and ‘k’. Through this mechanism we get a 
formula of which the special cases for 2 x 2 x 2 and 2 x 2 x ¢ were obtained by earlier workers 
(Bartlett, 1935; Norton, 1945). They obtained the formula through presumably some different 
mechanism, and perhaps for the analysis of variance situations where ‘i’ and ‘j’ are ‘ways 
of classification’ and ‘k’ is a ‘variate’. The authors feel that for the analysis of variance 
situations with frequency types of data the hypothesis of ‘no interaction’ in the form in 
which it is usually posed and tested may not be too meaningful, and they doubt whether, 
even with ‘normal variate’ type of data, it is very useful. The controversy that has raged 
for many years now over questions of interpretation is not without its own lessons. The 
authors feel this way about the whole concept of interaction in analysis of variance. 
Looking for a possible motivation behind the customary (and mostly ‘normal variate’) 
analysis, one cannot help feeling that factorial experiments (whether on the ‘normal variate’ 
type of data or more general types of data) present a problem which is essentially different 
from that of the rest of analysis of variance, e.g. the usual tests of significance of treatment 
differences. Assume, for simplicity, in the beginning that there is just one factor at, say, k 
levels. One might regard these as treatments, and test whether there are significant differ- 
ences between these, or, in terms of some characteristic, pick out the ‘best’ among these or 
rank these in some order. But that does not appear to be the relevant question here. The 
(second) characteristic in terms of which we have the k levels seems to be a continuous 
variate which is observed at k levels for practical convenience and what is of interest seems 
to be to lay down a statistical rule by which we can decide about the ‘best’ or ‘optimum’ 
point in therange of the continuous variate, the decision rule being in terms of observations 
at discrete levels and the ‘best’ or ‘optimum’ being in relation to the first characteristic. 
Likewise, taking, for example, two factors at k and / levels respectively, the problem seems 
to be not to test whether there are significant differences between these kl combinations 
regarded as treatments (which would really be a linear problem) or to pick out the ‘best’ 
among these or to rank these (in terms of some characteristic), which again would be each 
a really linear problem. It seems that there is a (second) characteristic in terms of which 
we have the k levels of the first factor and a third characteristic in terms of which we have 











376 Non-parametric generalizations 


the / levels of the second factor, both these (second and third) characteristics being supposed 
to be continuous variates. The problem is to lay down a statistical rule by which we can 
decide about the ‘best’ or ‘optimum’ point (in relation to the first characteristic) on the 
plane of the second and third characteristics, regarded as two continuous variates, the 
decision rule being in terms of observations at the k/ discrete level combinations. This of 
course can be generalized to several factors. The customary analysis into ‘main effects’, 
‘interactions’ of various orders, confounding, etc., all seem to point very stongly in this 
direction. Some work has already been done from this standpoint and further work is under 
way. This will be discused in a later monograph. 


Tn conclusion, it is a great pleasure to thank the editor and the referees for their valuable 
suggestions for the improvement of the paper both in form and in content. 


REFERENCES 


BaRnNarp, G. A. (1947). Significance tests for 2 x 2 tables. Biometrika, 34, 123-38. 

Bart ett, M. S. (1935). Contingency table interactions. J. R. Statist. Soc. Suppl. 2, 248-52. 

Cramer, H. (1946). Mathematical Methods of Statistics. Princeton University Press. 

FIsHER, R. A. (1922). On the interpretation of y* from contingency tables and the calculation of P. 
J. R. Statist. Soc. 85, 87-94. 

NeEyMan, J. (1949). Contribution to the theory of the y? test. Proceedings of the Berkeley Symposium 
on Mathematical Statistics and Probability, pp. 239-73. University of California Press. 

Norton, H. W. (1945). Calculation of chi-square for complex contingency tables. J. Amer. Statist. 
Ass. 40, 251-8. 

PrEarRsON, E. 8. (1947). The choice of statistical tests illustrated on the interpretation of data classed 
in a 2x2 table. Biometrika, 34, 139-67. 

REIERSOL, O. (1954). Tests of linear hypotheses concerning binomial experiments. Skand. Aktuar- 
Tidskr. 37, 38-59. 

Roy, 8. N. (1953). On a heuristic method of test construction and its use in multivariate analysis. 
Ann. Math. Statist. 24, 220-38. 

Roy, 8S. N.& KastrensBavum, M. A. (1956). On the hypothesis of no interaction in a multiway contingency 
table. Ann. Math. Statist. 27, 749-57 

Roy, 8. N. & Sarwan, A. E. (1956). On inverting a class of patterned matrices. Biometrika, 43, 
227-31. 

Yue, G. U. & Kenpatt, M. G. (1950). An Introduction to the Theory of Statistics. 14thed. London: 
Chas. Griffin and Co. 





m 


Tl 





ble 


ee 
jum 
tist. 
sed 
lar - 


‘sis. 





[ 377 ] 


A TWO-SAMPLE DiSTRIBUTION-FREE TEST 


By A. R. KAMAT 


Fergusson College, University of Poona 


i. In 1947 Mann & Whitney proposed a test for the eyuivalence of two distribution 
functions which was based on the sample ranks. Given two samples of size m and n they 
proposed that the two samples be pooled and ranked in ascending (or descending) order of 
magnitude. The sum of the ranks of one of the samples is then used in a test of the hypothesis 
that the two samples have come from the same population or, if it is preferred, of the 
hypothesis of the equivalence of the two distribution functions. Such a criterion will 
clearly be most sensitive to possible differences of location of the two distribution functions. 
Rosenbaum (1953) assumed that the location parameters of the distribution functions were 
equal, ard the criterion he suggested could be used to test for the equivalence of parameters 
of dispersion, by counting the numbers in one sample falling outside the range of the other. 
An alternative test to Rosenbaum’s is discused here, using a criterion based on the rank 
ranges of both samples. 


2. It is assumed that there are two samples {z,} (i = 1, 2, ...,m) and {y;} (j = 1, 2,...,m; 
m>n). The location parameters of the distributions of x and y are assumed to be equal. 
The samples are pooled and arranged in order of magnitude. Let R,, and R,,, be the range of 
ranks of x and y respectively. The test statistic proposed is 


Dam = Ry —Ry+m, (1) 


where D,,,,, can take values 0,1,...,m-+n. Large or small values of D,, ,,, will indicate 
possible divergence from the hypothesis that the parameters of dispersion of the populations 


from which the samples have been drawn are equal. 

3. The following example illustrates the calculation of D,, ,,. Two analysts in the same 
laboratory made repeated determinations of the percentage fibre in a sample of soya cattle 
cake which had been ground down into a uniformly mixed powder. The first analyst made 
n = 8 determinations and the second m = 10. The observations were as follows, the ranks 
of each, from 1 to 18, after pooling being shown.* It will be seen that 


Dg, = 12—17+10 = 5. 


The question is whether this result suggests a significant difference in the accuracy with 
which the two men can repeat their analyses: 


Analyst 1: Observations 12°38 12°53 12-25 12-37 12-48 12-43 12-30 12-46 


Pooled rank 8 2 14 9 3 6 12 4 
Analyst 2: Observations 12-45 12-31 12:31 12-20 12-26 12-73 12-42 12-17 12-09 12-23 
Pooled rank 5 10-5 10-5 16 13 1 ¥ 17 18 15 


n= 8, R,=14-2=12; m=10, Rk, = 18-—1=114. 


If we enter the table of percentage points (p. 378 below) with m+n = 18, n = 8, it is 
seen that for a two-tailed test, using a 5 % significance level, the lower and upper limits for 
Dg 1) are 3 and 15 respectively, so that the observed value of 5 is not significant. 


* There is one tie, i.e. two readings of 12-31 % by analyst 2, but this does not affect the value of Dg 4. 





A two-sample distribution-free test 


Table 1. Percentage points of D,, m 


For the lower points, P{Dy,, m<D,}<«; for the upper points P{D,, »>D,}<a, with 100a = 0-5, 1-0, 
2-5, 5.0. A stroke indicates that no value of D, satisfies the inequality. 





| percentage points | percentage points 





5:0 25 10 0-5 





com Co bo 


om © bo 





ao or & bo 


orm G bo 


fo>) 
ee > | 


IQ of Ww bo 
wre o | 


| 
| 


ou 





or or 


aor wt 


= 


o 


Te ee) 
— Se 
or 


~1 








_| m+n 


— 
for) 


17 


18 


19 


20 





percentage points | percentage points 





rf 
1:0 2-5 5-0 | 5: 








I TP w bo 
wrmwero | 
wow © | 





OI HK w bo 
toe ee S | 

wrote | 
wewtst © | 





weowrmrweo | 








CHOIS MP wh 
wrwee oS | 
PRR Owtore | 


p> PR PR wt | 


e 





OodISP 3M8&P wh 
ror rwrweo | 
wewwte © | 


| 
| 
| 


OoM1]S OP & bo 


ivakan oma 
ww oto bo = Oo 
pr eR RODE OS | 
Po; a - wh 


-_ 
—) 
































A. R. Kamat 379 
4. The probability distribution of D,,,,, follows from the application of simple com- 
binatorial rules. The total number of ways in which the sequence can be arranged is we ") 


and this will be therefore the totality of the values of D 
be built up as follows: 
(a) R, =n—-1, Rk, =m—-1; Dam =n. 

There are only two ways in which this can happen. Either the z-sample takes the first 
n places or the last n. 
(6) R, =n—1+% (¢ = 0,1,...,m—1), R,, =m—1+n; Dy, » = 4. 

Let the y-sample occupy the first and the last places and the 2-sample n of the m+n —2 
remaining places, remembering that R,, must equal n—1+7. This can be done in 


n,m: Lhis totality of values can also 


.. (n+4—2 
= = 3, 
(m ir( Riad ) ways 
(See, for instance, Whitworth (1901), Choice and Chance, chapter 111.) 
(c) R, =n—1+m, R,, = m—1+4+), (j = 0,1,...,.n-1); Dy mg =n+m—j. 
By symmetry the number of arrangements is 
. (m+j—2 
stn 5 
(n i( rag ) ways 


(d) R, =n—14+1, RB, = m—14+)9; Dam = nt+t—-j. 
Excluding (a), (b) and (c) it can be shown that the number of ways will be 


ee. (i = 1,2,...,(m—1); j = 1,2,...,(n—1)). 


Combining these four cases we may write 


_ 1 r—n+2j—2 pe ey 
= —-—) = = 2 g 
PP SE TAT SG m+n 45 1) + 2B a 
n 
; n+r—2 2m+n—r—2\ . : 
+C,(m—1=r)/ aude ) +D,(r—m—1)( “a )+22,}, (2) 


where A, = 1 forr<m, B, = 1 forr>m, C, = 1 forr<m, D, = 1 forr>m, E, = 1 forr=n 
and they are zero otherwise. When the two samples are of the same size a certain sim- 
plification of the probability distribution function is possible. Write m = n, whence 


1 r—n+2j-—2 oe 
PD,,.=*} al 5 (Se) +am-1-9 ("27S *)+20] 
n 


forr = 0,1,...,n. C, = lforr<n, E, = 1 forr = n and they are both zero otherwise. Values 
of the percentage points for m+n < 20 are given in Table 1. They are calculated from the 
distribution function of D,, ,, given above. 

The calculation of percentage points from the exact distribution clearly becomes 
impracticable when the sequence becomes at all large. 


I 











380 A two-sample distribution-free test 


5. The moments of the distribution of D,, ,,, can be obtained from the distribution itself 


or by considering the moments of F,, and F,, and their cross-products. Whichever way is 
chosen, the combinatorial sums are easily reduced by making use of the relation 


> ey e ene 

iso\ &} \ k+1 }’ 
although the reduction of the resulting polynomials is a long process. The first three 
moments of D,, ,,, are 


nm 








; 2m Qn \ 
Ce ar ea 

me 2m(n—1)(m+n+1) _2n(m—1)(m+n+1),  8mn_—_ 444 m+n 
Ma = (n+ 1)? (m+ 2) (m +1)? (m+ 2) (m+1)(n+1) : 


~ (m+1)8 (m+ 2) (m+3) 
: 24(m —n)(m—1)(n—1){(m+n+1)(m+n+2)—mn} 


(m+ 1)? (m+ 1)? (m+ 2) (n+2) 


12(m—n) { 4mn F m+n 
mean lamina 2 lem e—1)-2) [( n )}- 
(4) 


The fourth moment was also worked out but is not reproduced here. For the symmetrical case 





( 
- Sais — Thee Bin + 9+ he toad pe lee Be eee 
Ba (a+ int2i(n+3) 02220 tt—~™ : , 





fy = Nn, 

_ 4n(4n—1) _ ‘2n 
M2 (a1) (n +2) s+4/(""), (5) 
fs = 9. 


6. Figs. 1 and 2 show the form of the distribution of D,, ,, for m+n = 16, m = 8 and 12. 
The distribution is trimodal and it is not expected that the beta ratios, some values for which 
are given in Tables 2 and 3, will be very helpful in deriving approximations by which to 
extend the percentage limits of Table 1 beyond m+n = 20.* Some light is, however, thrown 
on the problem by considering the limiting distribution of D,, ,,. 


7. The limiting distribution has been obtained by D. E. Barton in an addendum p. 386 
following this paper. Let m+n = N and let N increase so that 


m|N>p, n/N>q=1-p. 
If we write d = D,, ,,—m, the limiting distribution of d is 
fd) = p~“q?{2p?|(1-—Q)-1-—d}_ (d<0) 


= 2Q?/(1-Q) (d=0) (6) 
= q*p*{2q7/(1—Q)-1+d} (d>0), 


* Use of the Pearson—Merrington table of standardized deviates (Pearson & Hartley, 1954, Table 42) 
with the correct mean, standard deviation and beta values and the application of a continuity correc- 
tion, has been found to give surprisingly accurate estimates of the 2-5% points at m+n = 16. The 
possibilities of this method of approximation, however, need further investigation. 











itself 
vay is 


three 


-1) 





(6) 


dle 42) 
orrec- 
3. The 








wT. 1 tt TY 0h ht 


! 

! 

& ! 
0-18 | 


od 


0-16 ———True distribution 





Standardized limiting 














distribution 
014 
! 
i | 
O12 : : 
A 
aan = 1X : \ ra’ 
a hs. Ls 
= / \ I Ls \ 
6 010 / .) V \ 
= / \ 
& x / \ 
j \ 
0:08 |- j \ 
/ \ 
= / \ 
/ \ 
0-06 F ! ‘ 
a / \ 
/ | 
04. } \ 
0-04 / \ 
Ve \ 
0:02 - / \ 
i, \ 
2 + 
Zo ah 
a eT a ae ee ae ae ee 
0 2 4 6 8 12 14 16 
Scale of D 
Fig. 1. Distribution of D,,, for case m+n=16, m=8. 
2 4 6 8 10 12 14 16 
0-12 T T T T T T T T T T T ee T T T 





oe ’ 
— True distribution ee 
all * 
/ \ 


Standardized limiting 
010 distribution fo \ 


0:08 


Scale of f(D) 
2 
S 
o 


2° 
o 
4 


0:02 








8 
Scale of D 
Fig. 2. Distribution of D,,, for case m+n= 16, m= 12. 




















382 A two-sample distribution-free test 


where Q = pq. When p = q = 3, we have for the symmetrical case 
f(a) =4 (4=0) 
= 752714 (3|d|-1) (d+0). 


The first four moments of the limiting distribution can be derived from the values 
obtained in the finite case. They are as follows: 


&(d) = 2(q—p)Q 

f,(d) = 4—6Q-1 + 20-2, 

Hz(d) = 2(¢—p) {Q- — 5Q-* + 293}, 

fig(d) = 28~174Q-1 + 266Q-2 — 144Q-3 + 249-4. 


(7) 


Table 2. Momental constants of D,, ,, for case m+n = 16 


























| 
n m Mean S.D | bs VAx | Bs 
2 14 4-9333 3-805 | 14-4789 0-669 | 2-681 
3 13 69250 | 3616 | 13-0765 | 0-178 2-340 
4 12 78154 | 3-271 | 10-7022 0-000 2-404 
5 | 8-1667 2-993 | 89567 — 0-069 2-545 
6 | 10 | 82338 2799 | 78355 — 0-072 2-599 
7 | 9 | 81500 2-687 | 72188 — 0-044 2-615 
| 8 | 8 | 80000 2-650 | 7-0225 0-000 2-618 
| | | 








Table 3. Momental constants of D,, » (symmetrical case) 








| 
n=m 8.D. Me Be 
2 1-155 1-3333 | 3-000 
3 1-673 2-8000 | 2-500 
4 2-014 40571 | 2-413 
5 2-250 | 5-0635 | 2-434 
7 2-550 6-5012 | 2-554 
10 | 2-796 | 7-8182 2-732 
15 | 2-998 8-8953 2-955 
Pa | 3-464 12-0000 | 3-583 








Note. For n=m the distribution is symmetrical, with mean at n. 


It will be seen that if m and n are large and equal, i.e. p = q = }andQ = pq = }, the limiting 
values of the second and fourth moments and of the /, ratio become 12, 516 and 3,5 
respectively. 


8. Table 4 shows some values of the moment constants for the distribution of d. If 
these are compared with values given in Tables 2 and 3 it becomes clear that the limiting 
form has been nowhere nearly reached when m+n = 16. Nevertheless there is some simi- 
larity in shape, at any rate for m and n not too unequal, which is brought out when the 
limiting distribution is adjusted to have the same mean and standard deviation as the 











A. R. Kamat 383 


distribution of D,, ,,. This is shown in Figs. 1 and 2 in which limiting distributions with 
p = 4 and ? have been fitted onto the distributions with m+n = 16, m = 8, 12. The agree- 
(7) ment which can be said still to exist when m = 12 has vanished for m = 14 (as could be seen 
were a similar comparison made). At m = 12 and 14 the limiting distribution gives an 
appreciable frequency below zero, when standardized. The adjusted limiting distribution 
has, of course, smaller intervals between its ordinates than the distribution for finite N , 
since 0,>@p, and this results in a greater number of ordinates for the former. Thus its 
frequency polygon is, in general, below that for D. This effect would be corrected if the 
(8) cumulative distributions were compared. 


alues 


Table 4. Moments and standardized deviates for the limiting distribution of d 





Row | p=m|(m+n)> 0-5 | 0-6 0-7 0-8 0-9 
| 





0 — 1-67 | — 3-81 —T5o | —I9-98 
3-46 3-70 4-56 6-68 13-57 
0 — 0-49 —0-91 — 1-20 — 1-37 
3-58 3-92 4-68 5-42 5:87 


8 (0-017) 6 (0-015) 4 (0-021) 3 (0-015) 1 (0-012) 
7 (0-030) 5 (0-031) 3 (0-050) 2 (0-043) 0 (0-030) 


| 
| 
| | 
| 
| 
| 
Upper 2-5 % point* | 


1 | Mean 

2 | | Momental constants for | s.p. 

4 Be 
(i 





d 
d 


d+ | —7 (0-030) | —10 (0-028) 


| —15 (0-025) | —24 (0-027) | —52 (0-026) 
d- | —8 (0-017) | —11 (0-018) 


—16 (0-019) | —25 (0-022) | —53 (0-024) 


a 





Lower 2-5 % point* { 





| 
| 
9 | \ Corrected and stan- {Upper | 1-99 1-77 
10 | dardized 2-5 % points \Lower — 1-99 —2-20 | . 
| } | 
* The quantities d+ and d- are values of d which may be described as bracketing the 2-5 % point. The figures 
in parentheses are the probabilities (calculated from Barton’s formulae): (i) P{d>d+} and P{d>d-} for the upper 
percentage point, and (ii) P{d<d~} and P{d<d-} for the lower percentage point. 


| 
| 
1-57 | 1:44 | 1-29 

















9. The following is a possible method of approximating to the percentage points for 
Dy, m When m+n > 20, making use of the similarity referred to in the preceding paragraph. 

(a) First ‘standardize’ the limiting distribution. The method is illustrated in the case of 
the 2-5 % points. Rows 5-8 of Table 4 show (for p = 0-5 (0-1) 0-9) the values of d which 
bracket the upper and lower 2-5 % points. Linear interpolation between d+ and d- gives 
approximate 2-5 °% points to which a continuity correction of 0-5 has been applied. Thus 
for the lower percentage point when p = 0-8, P{d < — 24} = 0-027 and P{d< — 25}=0-022. 
Linear interpolation gives — 24-40 as a value for which P{d < — 24-40} = 0-025 were the 
distribution continuous. To this value we must add 0-50 as a continuity correction,* and 
niting finally have for the standardized deviate 
d3y | — 24-40 + 0-50—(—7-50) _ 


Tea ae ee ee 
6-68 “— 





d. If a figure given in row 10 of the table. 


niting * In most problems of this kind the limiting distribution is continuous and no continuity correction 

simi- is required to find the standardized limiting percentage point. Here, both the distributions of d and D 

are discontinuous, but the standardized argument interval of the former does not agree with the unit 

nthe interval of the latter distribution. The procedure suggested may be described as adding a correction 
is the for d and later removing it on a modified scale to get the significance point for D. 








384 A two-sample distribution-free test 


(b) We now apply to the standardized limits the values of the expectation and standard 
deviation of D, for any chosen finite values of m and n, giving 


Percentage limit for D = &(D)+o(D) x standardized limit for d. (9) 


Thus in the case m+n = 20, m = 16 or p = 0:8 we have, using Table 4 and the expressions 
for 4, and ,/“, from equations (4), 


Upper 2-5 % limit for D = 10-07 + 3-994 x 1-44 = 15-82. 


15-82 





14 15 16 17 
Fig. 3 


Table 5. Comparison of approximate and true 2-5 % limits when m+n = 20 






































| 
Standardized Percentage points Limits True 
limits from Case m+n = 20 using standard- | corrected for | limits from 
Table 4 | | izing factors discontinuity | Table 1 
p m | | 
| | | | | 
| Upper | Lower | Mean D | op | Upper | Lower |Upper|Lower| Upper | Lower 
| 
= oe a = — E MESS SS 
0-5 10 | 199 | —1-99 | 10-00 | 2-796 | 15:56 | 4-44 17 3 16 | 4 
0-6 12 | 1-77 —2-20 | 1056 | 2-912 15-71 | 9 4:15 17 3 17 | 4 
0-7 14 | 1-57 — 2°34 10-80 | 3-288 | 1596 | 3-11 17 2 18 | 3 
08 16 | 1:44 | —246 | 10:07 | 3-994 15:82 | 0-25 17 a 6. | 1 
0-9 1g | 1-29 | -252 | 621 4-719 12-30 — 5-68 13 a 18 = 
| | | | 
Note. The entries ‘—’ in the last column and last column but two indicate that no value of D is 


significant at the lower 2-5 % level. 


If D were continuously distributed, 15-82 would be the approximation to the percentage 
point; as it is not, and 15-82 > 15-50, we cannot include the value D = 16 in the 2} % tail 
region, but must be content with the statement P{D > 17} < 0-025. The position is illustrated 
in Fig. 3. 

10. Proceeding in this way estimates of upper and lower 2-5°% points for the case 
m+n = 20 were obtained as shown in Table 5. The true values, as given in Table 1, are shown 
for comparison in the final columns. ror p = 0-5, 0-6 and 0-7 the estimates are not more 
than one unit ir! error,* but for p = 0-8 and 0-9 the method is clearly inadequate for m +-n 
as low as 20. Some improvement can be obtained for these extreme values by using the 
form of limiting distribution suggested by Dr Barton at the end of his note (p. 387), but it 
is doubtful whether the test should be usec at all in samples of this size with n < }m. 


* In the case p = 0-5 agreement is really very close; had the entry in col. 7 been 15-50 and that in 
col. 8 > 4-50, the figures in cols. 9 and 10 would have been 16 and 4. 











lard 


(9) 


ions 














age 
tail 
ted 


‘ase 
ywn 
ore 
4-2 
the 
t it 


t in 





A. R. Kamat 385 


11. To sum up, for m+n> 20 and m/(m+n) not greater than, say, 0-75, approximate 
limits can be obtained from equation (9), using values of the expectation and standard 
deviation of D,, ,, found from the expressions for 4; and 7, given in equation (4). Values of 
upper and lower 2-5 °% standardized limits for d can be obtained with sufficient accuracy 
by interpolating in the following table. 


Table 6. Corrected and standardized 2-5 % limits for d = D,,,—m 











] l l 
| | | | 
p=m[(m+n) | 0-50 | 0-55 | 0-60 | 0-65 0:70 | 0-75 | 
| | 
Upper 1-99 | 1-88 1-77 1-67 1-57 | 1-50 
Lower —1-99 —2-10 — 2-20 — 2-28 — 2-34 | —2-41 











As an example, we may find approximate 2-5 % limits for the case m = 25, n = 15. 
Here p = 0-625 and interpolation in Table 6 gives 1-72, — 2-24 as the standardized limits. 
From equation (4) we find 


Mean D,,; 9, = 23-03, fg = 11-4236, s.D. = 3-380. 
This gives as limits for D: 
23-03 + 1-72 x 3-380 = 28-84 and 23-03— 2-24 x 3-380 = 15-46, 


or, correcting for discontinuity, we should take 30 and 14 as approximations to the upper 
and lower 2-5 % points. As 15-46 is only just below the borderline value of 15-50 we might 
reasonably regard 15 rather than 14 as the lower limit. 


12. As mentioned in the introduction the Rosenbaum test can be used for the test of 
the hypothesis which we have set out. His test procedure consists of counting r, the number 
of observations of one sample which lie outside the extreme points of the other. It is inter- 
esting to note that when one sample (say, the m-sample) is wholly included within the 
extreme values of the other, the Rosenbaum r and 2 are connected by the relation 


D, » = mM+r 


mm 


A comparison of the percentage points of the D-tables and r-tables will show that the upper 
significance points of m +r are never less than the corresponding significance points of D, ,,. 
It might be expected that the D-criterion will be more sensitive to deviations from the 
hypothesis tested than with the Rosenbaum criterion, but this can only be decided definitely 


when the power functions of the two tests have been obtained. 


Finally I wish to thank Prof. E. 8. Pearson and Dr F. N. David for taking a keen interest 
in these investigations and for a number of helpful suggestions in the preparation of this 
paper. 


REFERENCES 


Mann, H. B. & Wuitney, D. R. (1947). Ann. Math. Statist. 18, 50. 

Pearson, E. 8S. & Hartriey, H. D. (i954). Biometrika Tables for Statisticians, 1. 
Cambridge University Press. 

RosENBAUM, 8S. (1953). Ann. Math. Statist. 24, 663. 


25 Biom. 43 











386 _ A two-sample distribution-free test 


THE LIMITING DISTRIBUTION OF KAMAT’S. TEST STATISTIC 
Addendum by D. E. BARTON, University College, London 
The limiting distribution of Kamat’s D criterion is easily obtained from the following argument. 
Kamat shows that the joint distribution of i = R,-—n+1 andj = R,,—m+1 is given by 
f(i,j) = 2K (j = 9 =%) 
= K(m—-i-1)"**"C,_,. (=n, 0<i<m-1) 
= K(n-—j—1)™+i"C,,_. (0<j<n-—1,i=m) 
= 2K *+i-8C,,_, (0<j<n—-1,0<i<m-1), 
where Bos PIM, 
The marginal distribution of 7 is consequently 
f(i) = K(m+ 1-1) "*+*°C,_,  (O<i<m), 
for which the mean and variance are 
Mi = mn—1)/(n+1), fe = 2m(n—1)(m+n+4+1)/{(n4+ 2) (n+ 1)}. 


That of 7 is the same with n and m interchanged. 
If we let N = m+n tend to infinity so that m/N > p, n/N >q, we may write the moments of 7 as 


Mi, = Np—2p/q+O(N-), py = 2p/q?+O(N-). 


Thus if r = m—i,s = n—j,d = s—r = D—™, the first two moments of r and s tend to finite limits and 
also, since the correlation coefficient of r and s tends to —,/(pq), so do the first two moments of d. The 
probability distribution functions, equally, tend to a proper limit. Thus 

m(m—1)...(m—r+1)n(n—2) 


M=¢+ aa. rs ee eee 





and similarly the probability distribution function of r and s tends to 
S(r,8) = 2prtig*t? (r>0, s>0) 

= (s—1)p*q* (r=0, s>0) 

=(r—1)p’g? (r>0,s8s=0). 
The probability generating function 

pax +y—xy) \? 
N, (X,Y) = Uf(r, 8) a7y*® = a 
~ shail (1= per) (1—qy) 
follows at once from this, whence, putting 
x=e-t, y= eit, 
we have the limiting characteristic function of d as 
pq(1 — ett + eit) \ 
(e*—p) (1—qe") 
From this it may be verified that the first moment of d is 2(q¢—p)/(pq) and the higher moments are 

the same as Kamat’s limiting moments of D. 


The actual limiting distribution of d is discrete and most simply got from the formula for f(r, s), i.e. 
f(d) = p~*q*{2p* —(1—Q)(1+d)}/(1—Q)  (d<0) 
= 2Q?/(1—Q) (d=0) 
= q*p*{2q?—(1—Q)(1—d)}/(1-—Q) (d>0), 
where Q = pq. This reduces when p = q = } to 
fd) =% (d=0) | 
¥x(3|d|—1)(4)'*! (d+0).J 


Palt) = 








w! 


va 





S 


3 and 
. The 


ts are 


s), i.e. 








A. R. Kamat 387 


The tail probabilities may be obtained explicitly by summing the expressions for f(d). Thus 
ford>0: P(d) = P{s—r>d} = q{ipd—1+2q/(1-—Q)}, 
ford<0: P(d) = P{s—r<d} = p'4'{q|d|—1+2p/(1-Q)}. 


These may be used to obtain limiting percentage points. 
The modality of the probability distribution function is always triple, as it may be shown that (taking 
p<gq without loss of generality) the following relation holds: 


+10 <f(—3) <f(—2)>f(— 1) <f(0) >f(1) <f(2) --- <f(dmax.) > +++ 
where d,,,,. is the integral part of 1+ {p-!— 2q*/(1—pq)}. 


x. 


The integer d,,,, equals 2 if p> 4, but rises rapidly for smaller p~ (taking the values, 3, 4, 9 as p takes the 
values 0-3, 0-2, 0-1). 
If n is kept finite but m— 00, a second limiting form may be obtained. In the notation used above 


&(s) = O(N-1) = var(s), &(r) = O(N), var(r) = O(N), 
and hence « = —d/N has the same finite limit as r/m. This is easily seen to be the Pearson Type I curve 


f(x) = n(n—1)a(1—2)"-, 


25-2 











[ 388 ] 


SEQUENTIAL ANALYSIS APPLIED TO CERTAIN EXPERIMENTAL 
DESIGNS IN THE ANALYSIS OF VARIANCE 


By W. D. RAY 


British Coal Utilization Research Association 


1. SUMMARY AND INTRODUCTION 


On the basis of some fundamental work due to Barnard (1952) and Cox (1952), Johnson 
(1953) has derived a procedure for applying sequential tests to the general linear hypothesis. 
Hoel (1955) obtains a similar test by a rather different method. 

The general linear hypothesis underlies a number of common analysis of variance situa- 
tions. It is the purpose of this paper to provide tables which will make it possible to carry 
out the suggested procedures in the cases of (a) one-way classification by groups, and 
(6) randomized blocks. In the course of construction of these tables, we have considered 
a number of approximations to the confluent hypergeometric function and assessed their 
accuracy. 

Conjectural approximations to the expected sample sizes of the sequential processes 
considered are also given. 





2. A SEQUENTIAL TEST OF THE GENERAL LINEAR HYPOTHESIS 


In the classical fixed sample case the general linear hypothesis may be expressed in matrix 
form as follows: 


(i) Let x = (a, ...,%,) be N independent normal variables. 
(ii) &(x) = 6C’, where 


@ = (A, ...,0,) = (4, «-+ Ogg | Pg—g-ts +++2 94) = (Oy, My), 


ie i & 
c-( } = exc (¢<N), 
Cy, «+ Cn 
where C is partitioned similarly to 6. 
(iii) Y(x,) = o% (i=1,..., NV). 
(iv) The @’s and o are unknown parameters; C is known and is called the design matrix. 
The hypothesis to be tested is H,: 6., = 0. The likelihood ratio criterion is then a function 
of G = S,/S,, where S, is the minimum of (x —6C’) (x —@C’)’ with respect to 6 and S, +58, 
is the minimum of (x — 6, C4) (x — 9) C4)’ with respect to @,). 
If observations are taken sequentially then at each stage we add a further set of 2’s, 





a new row of C being added for each additional set. Thus after N observations have been = /{ 


taken we may calculate a value G), say, of G. 
Under certain conditions, which are satisfied in the applications considered in this paper, 
a sequential test comparing the composite hypotheses 


Ay: 8.0) = 0, 
H,: Oo = », 


hy 





‘AL 


ison 
asis. 


bua- 
AITY 
and 
red 
heir 


sses 


trix 








W. D. Ray 389 
may be based on the likelihood ratio (Arnold, 1951) 


P(E™ |p) _ yam u(* —s8t+q 4, Bee a) 


ode a 2 72 14am) 
where AM) = pC {1 — CH ( CH” Ce) +1 CH} CB —p’ 

7 2 
and M(X,Y;«) => UDEE+) 7 ) 


j= >» (XY +5) i 


This procedure is exactly analogous therefore to that employed in comparing two simple 
hypotheses, i.e. 


(N) 
‘Accept H,_ if lL) See 8 


.- p(a® 
Accept H,_ if amy Si a 


Otherwise take further observations.’ a, f are the Rake ae chances of erroneous 
rejection of Hy, H,, respectively. 

Evidently for all values of ~ giving the same value for the scalar quantity A there will 
be the same sequential procedure. 

In the following sections we shall consider the application of this general method to the 
two special cases mentioned in § 1. In these cases we always have A = N46, where 6 does not 
depend on JN, and so our alternative hypothesis H, can be defined in terms of 6. 


3. ONE-WAY CLASSIFICATION BY GROUPS 


This is the simplest design in the analysis of variance. The data are arranged in k groups, 
and it is desired to reach conclusions about possible differences between the means of the 
different groups. The sequential procedure which we shall consider consists of taking an 
equal number of observations from each group at each stage in the experiment. The 
theoretical model may be expressed in the following form: 


XY, = a+b,4+%, = 1, ...,&; t=1, ...,%), 


k 
where E(z,)= 0, V(z%)= 0%, YO,=0. 
t=1 
In this case N = kn, s = kandq =k-1: 
k — _— 
n> (%,,—%,.)? 
GM) = pa 2 bess 
XD DU (%-%.) 
t=1i=1 
n&b; xb? 
(N) Sees. ee 
m\ = d 6 ko? 


The procedure is now: 





kn-1 k—-1 3AMG™) _ 1-6 
‘ fam) thet 
Accept H, if e “(= “a ee a 
, kn-1 k—1. 3AMG™ B 
Raw 
Accept H, if e-* 'u( “‘e6an e Som) <TH" 


Otherwise take a further set of k (or mk) observations, one (or m) from each group.’ 











390 Sequential analysis applied to certain experimental designs 
This is equivalent to the conditions: (i) accept H, if GY) > G); (ii) accept Hy if GQ < G); 
(iii) if @ < G@) < G) continue sampling, where G, G@) are the solutions of the equations 


a B_ I= B_ gun yg (a1 B=, GH 
inl . 2° 27° 14@™)° 





1-2’ 
The appropriate limits G™, G™) for 5 = 0-5, 1-0 and 2-0 with probabilities of error 


a = # = 0-05 and various values of k are given in Tables 1, 2 and 3 printed on pp. 399-401 
below. 


4, RANDOMIZED BLOCKS 


The well-known model for randomized blocks with k ‘blocks’ and n ‘varieties’ is 


Ly = A+b,+0,+2% (¢=1,...,4; t=1,...,7r), 


k 
where ; >>5=0, dSv,=0 
' 
and f 


The sequential procedure with this design, having fixed the number of varieties n, is to 
decide at each stage whether or not to add another block of the same n varieties. 

Hence we have N = kn, s = k+n-—1 and if H, be the hypothesis v; = 0 (i=1,...,n), 
then q = n-1; 


i 
4 
x 


(@,—2,,)? 
' 


“a i=1 : : 

GM) = kon ? 
DS D(%qj—-%4—-%,+7%,,)? 
t=1i=1 


= 
iMe 





n n 
‘a 9 +! 9 
kari Pe 
AM =—— and é=—,. 
a no 


Analogously to the case of the one-way classification by groups, the sequential procedure 
may now be defined by: (i) accept H, if GY > G), (ii) accept H, if GQ < G™, (iii) otherwise 
add another block to the experiment. In this case @), G) are the solutions of the following 
equations in G); 

Op bas u(Haet a-h aed 

l-a@ 2 2 °14+4@™ 
The appropriate limits ¢™), G) for 6 = 0-5, 1-0 and 2-0 with a = # = 0-05 and various 
values of n are given in Tables 4, 5 and 6 printed on pp. 402-403 below. 


5. DETERMINATION OF THE LIMITs G, G 


The determination of these limits entails the use of fairly extensive tables of the confluent 
hypergeometric function M(X, Y; uw). In this connexion Rushton (1954) and Rushton & 
Lang (1954) have provided a fairly comprehensive survey and tabulation of the function. 
For the values covered by Rushton & Lang’s tables, @ and G were calculated by inverse 
interpolation of the function log M(X, Y; w) with regard to w. 

Unfortunately, the tables were not sufficiently extensive to cover all the cases it was 
necessary to consider. For example, in the one-way classification by groups for even (odd) k 
only limits at stages of even (odd) n could be determined, while in the case of randomized 








n 





Qo). 
tions 


error 
—401 


dure 
‘wise 
wing 


‘ious 


uent 
on. & 
tion. 
rerse 


was 
ldjk 
ized 








W. D. Ray 391 


blocks only at odd (even) k for even (odd) m. So, too, it was found for some k and n that not 
close enough intervals of wu had been tabulated to obtain rapid enough convergence of the 
inverse interpolation process. 

In order to cover these situations, attempts were made to obtain useful approximations 
to the likelihood ratio or equivalently to the confluent hypergeometric function. 


6. APPROXIMATIONS TO THE LIKELIHOOD RATIO 


(a) The likelihood ratio R is the ratio of a non-central F (or G) distribution to a central 
F (or @) distribution where the degrees of freedom of the F’s are v, = g, vg = N—s and the 
non-centrality parameter is A. 

Thus 


= p(F ‘| Ay) F’=F _ WG" | Mena 
p(F | Hy) p(@ | Ay) 
where F = y,G/v, and F’ = v,G’/r,. 
eT T(a(q+.N —s)) G42 1 SAG’ 
v d = —— = N- 44; AB 
Now 9(6"|H) = Fern guy aya eatena (H+) 07g) 
a PN Adis. kal 
a ae T(3q) PAV —8)) (1+ Ghat)" 
Thus R=e-** M(4uv- s+q), hq; )- (1 bis) 


Patnaik (1949) has approximated to the non-central F-distribution by a central F-distri- 
bution having the same first two moments. 

In the present instance the F’ distribution in the numerator of R would be approximated 
to by a kF, ,, distribution, where k = (q+ A)/q, v = (¢+A)?/(q+ 2A). Hence 


Vive 
Tao +) Pe 
p(G’ | Hee = T'(4v) Pr») (5 i) VGt’\ he+"9)’ 
a — 
=4) 
, D+) G4 
hil *| Ho) = 5 
while p(@ | Hp) T(47,) T'(4v9) )(1+ @)hertre)’ 
See v 
D(3v) P(3(¥1 + ¥2)) Wik 





and thus R= 





y" Gio (1 + oo 


vG \ +19 
(1 +o 

where v = (v,+A)?/(v, + 2A). 

In virtue of (1 bis) this can also be regarded as an approximation to the confluent hyper- 
geometric function. 

(b) While the above approximation (3) holds for all v,, v. it was considered worth while 
to investigate whether any simplification resulted as v.00. Patnaik (1949) has shown that 
the first two moments of F’ are 


wry = "421 -2)", 


Vo 


2\-1 4\-1 
a) = sa a af 3 (:-*) 


Vv 











392 Sequential analysis applied to certain experimental designs 


Hence for pv, large (but not infinite) 
— oT. 2\- 
My(F’) = ant: a (1 - > , 


: v 
and therefore v,(1 — 2/v,) F’ has moments “; = ¥,+A, #2 = 2(v,+ 2A). We therefore approxi- 
mate to the distribution of F’ by regarding v,(1 — 2/v,) F’ as being distributed approximately 
as vy’? with v, degrees of freedom and non-centrality parameter A. 
Inserting the corresponding approximations for the distribution of G, we obtain 
~ (3%) UA, — 2) GY 
eeBy 1A) As — 2) 95 
™ be =, I'(4v,+7) r! 
= e-tA T(dv 1) Ty, a(t Ve— 2) G4) : (4) 
(H{A(v, — 2) GPs 
where J;,,_, is a Bessel function of imaginary argument. 
When 1, = 1, that is, for example, when & = 2 in the one-way classification by groups, 
llc pa PEL y(Alrs—2) GH) 
(B{A(v2 — 2) GA)-* 
= e~* cosh {A(v,— 2) Gt. (5) 


This may be compared with an approximation to the confluent hypergeometric function 
given by Arnold (1951). 
Using the relationship 


(3¢)* 


Ix(f) = F(X 4 pe Mix+4, 2X +1; —2¢), 


and Kummer’s formula M(X, Y; z) = e* M(Y —X, Y; —z), we obtain (4) in the form 
R= eH e-A.-9G" YM (hy, — $, vy, — 1; 2{A(v_— 2) G}}). 


Comparing (1 bis) and (4) it may be noted that we have in fact approximated to a confluent 
hypergeometric function by a similar function of a slightly different form. 


7. CONSTRUCTION OF THE TABLES OF LIMITs G, G 


In order to decide on the extent of tabulation required, the results of Baker (1950) were 
taken as a rough guide. In an empirical investigation into the distribution of sample size 
in sequential tests comparing certain simple hypotheses he found that in less than | in 20 
cases did the actual observed sample size exceed three times the expected sample size. 
The tables were accordingly constructed to cover sample sizes up to three times the expected 
sample size, and it is to be anticipated that they will cover most of the cases encountered 
in practice. 

Wherever possible the logarithm of the expression (1 bis) was equated in turn to 
log (1—/)/a, log £/(1—«), and the solutions G, @ respectively obtained by inverse inter- 
polation. By far the greater content of Tables 1—6 was arrived at by this method. To produce 
certain intermediate values when, in the case of the one-way classification, n was small or 
when the range of the tables of the confluent hypergeometric function was exceeded, 
recourse was made to approximation (3). Solutions G@’, G’ obtained by the Newton—Raphson 
iterative method applied to approximation (3) were of great assistance in approximating 












































W. D. Ray 393 
to the exact values. It was found that G’ was a very close estimator of G but that G’ was 
not equivalently as good an estimator of G, at least for small n (or k in the case of randomized 
blocks). However, it was found that the rate of change of the limits for increasing n (or k) 

i was almost identical using the correct form (1 bts) or the approximate one (3). Thus know- 
a ledge of the approximate limits and their differences for increasing n (or k) could be utilized 
y to obtain good estimates of the true limits providing a few exact values had already been 
obtained for small n (or k). Estimates obtained this way are printed in italic type in Tables 
1-6. 
Table 7. Comparison of limits obtained from equations (1 bis) and (3) 
One-way classification by groups. 6=1, a=f=0:05. 
k= 2 
(4) a —_— 
| | n A G | G G’ G 
| | Sees ee es) | Sees ee haan aan a —— 
| | | 
_ Std) ipa 0-052 | 28390 0-096 2-410 
6 | 12 0-105 1-016 0-138 1-001 
: | 2 0-135 0-710 0-164 0-700 
10 | 20 0-150 0-578 0-179 | 0-580 
(5) 12 | 24 0-166 0-504 0-195 0-502 
0 | 
ction k=5 
| 
n | A | G G a G’ 
| a al — a = = 
| 
3 15 0-331 2-469 0-348 2-497 
5 25 0-340 0-973 0-357 | 0-958 
7 | 35 0-331 0-687 0-347 0-685 
9 45 0-322 0-565 0-338 0-568 
11 55 0-314 0-498 0-339 0-503 
luent =) 
G, G were obtained from (i bis); @’, @ from (3). 
Table 8. Comparison of limits obtained from equations (1 bis) and (5) 
were | One-way classification by groups. 6=0-5, a=/=0-05. 
ssize | k=2 Lan 
in 20 | "i e 
. n A G G G’” Ga’ 
size. | 4 | G 
scted | a = " 7 
tered 6 | 6 | 0-002 1-091 0-012 0-918 
8 | 8 0-025 0-639 0-032 0-608 
to } 10 | 10 0-040 0-471 ‘0-047 0-466 
“ 12 | 12 0-050 0-385 0-058 0-387 
nter- | 4 | 14 0-059 =| 0-333 0-067 0-337 
duce Mf 16 0-066 0-300 0-074 0-302 
al es | ss J oon 0-276 0-079 =| 0-277 
20 20 0-076 0-258 0-083 0-258 
oded, 30 30 | 0-089 | 0-205 0-097 | 0-207 
jhson | | 














ating G, G were obtained from (1 bis); @”, @” from (5). 








n 








394 Sequential analysis applied to certain experimental designs 


The approximation (4) has not been investigated numerically, except for the case when 
v, = 1. It is found that it gives reasonable agreement of the limits when 7 is fairly large. 
Once again, however, the rate of change of limits with increasing n (or k) is similar whether 
(1 bis) or (5) is used to obtain them. 

Thus either approximation may be used together with some known exact limits obtained 
from (1 bis) to expand or interpolate where required in Tables 1-6. (Tables 7 and 8 give a 
few examples of the accuracy of these approximations.) 

It may be noted from Tables 1—6 that in certain cases no decision is possible before a certain 
stage. This is because G by definition must be real, finite and positive. Thus in some practical 
situations an initial sample must be taken before the sequential procedure can be put into 
operation. 


8. THE PRACTICAL USE OF THE TABLES 
Example 1. Suppose we require to take observations from each of three groups and 


postulate that the model of §3 might describe the situation. We wish to test the null hypo- 
thesis that each group mean is zero against the alternative hypothesis that 





The probabilities of error «, £ are taken to be equal to 0-05. 


Table 9 








Sum of squares 





| Between | Within 





0-6 0-6 | 0-36 2-3 2:3 5-29 | —7:8 | —7-8 | 60-84 | — — — 
—44 |) —38)| 19-72 13-4 | 15-7 | 184-85 | 9-3 1-5 | 147-33 | _— | —_ C= 
15-1 | 11-3 | 247-73 3°8 | 19-5 | 199-29 | 13-4 14-9 | 326-89 | _— —_ — 

2-8 14°] 255-57 5:7 | 25:2 | 231-78 | —61 8-8 | 364-10 | — — | — | 
— 6-1 8-0 | 292-78 | 9-9 | 35-1 | 329-79 8-1 16-9 | 429-71 | 76-32 735-96 | 0-104 | 

30 | 11-0 | 301-78 5-7 | 40-8 | 362-28 10-1 27-0 | 531-72 | 74-14 776-67 | 0-095 

0-6 | 116 | 302-14 | —1-1 | 39-7 | 363-49 2-9 29-9 | 540-13 | 58-12 833-67 | 0-070 





Three samples from normal populations whose means are zero and whose standard 
deviations are 10 have been chosen to illustrate the procedure. 

In Table 9 the approximate calculations are shown, a value of @ being calculated at each 
stage of sampling after n = 5. Table 1 is consulted until one or other of the corresponding 
limits is violated. The example considered results in a correct decision in favour of the null 
hypothesis when n = 7. 

Example 2. A practical application in which the randomized block design of §4 may 
be used, occurs when sampling for dust concentration in a rectangular duct (see Fig. 1). 
Sampling probes can be transversely inserted in four positions P,, P,, P;, P, and collections 
of dust made over the width of the duct. It is required to determine whether or not the effect 
of gravity on the dust flow is creating stratification from top to bottom. Sampling is carried 





ou 
fix 
ha 


ele 
as 
or 





vhen 
urge. 
‘ther 


ined 
vea 


‘tain 
tical 
into 


and 
ypo- 











W. D. Ray 395 


out at each of the four positions in random order repeating the scheme after a series of 
fixed time intervals. The accuracy of measurement of dust concentration is known before- 
hand. 

In this situation the ‘varieties’ correspond to the four positions and the ‘blocks’ to the 
elements of the time sequence. The hypothesis of stratification is numerically interpreted 
as a particular value of 6 and the sampling procedure carried out until a decision is reached 
on its validity. 

Top 





| 


| 
| 
Gravity = a a rat Probe 
| 
| 





Fig. 1. Cross-section of rectangular duct. 


9. EXPECTED SAMPLE NUMBER 


Although no general formula for the expected sample number for composite hypotheses 
exists, Bhate (1954) conjectures that the natural generalization of Wald’s formula: 


(1—«) log - +alog*—? if HA, is true, 
é[log R| N = &(N)]= , (6) 
Blog fee +(1—/) log = if H, is true, 





where R is the likelihood ratio, may give useful results. 

As a first approximation, using the method described below, the left-hand side of the 
equation was evaluated as the value of log R when N = &(N). The values so obtained are, 
of course, both conjectural and approximate. In Table 10(a) and (b) they are compared 
with the sample size required for the (fixed sample) analysis of variance test having the same 
prescribed chances of error as the sequential test. The latter were found from the graphs of 
the power functions of analysis of variance tests given by Pearson & Hartley (1951). 

It will be seen that, as in other cases of sequential testing of hypotheses, the apparent 
saving in sample size is of the order of one-third to one-half of the fixed sample size. 

Substituting the appropriate expression for R from (1 bis), the left-hand side of equation 


"s E[—4Alog,,e+log,), M(X, Y; u)], 
where X=}(N-s+q), Y=}hq, w= }AG/(14+4). 
Now by Taylor’s theorem 
log.) M(X, Y; uw) 

u—6(u)) 0 


_£ ( ia! (w—&(u))? 2 
= logy) M(X, i &(w))+ a Gy lo8i0M)| it = 2! ~ Bue 


Thence we have to a first approximation 


— fAlog, e+ logy) M(X, Y; &(u)). 








396 Sequential analysis applied to certain experimental designs 


When A, is true it may be shown that 





nen § R(T +m) 
1+4 


G Y ° p 
652) =x and when H, is true a( mao m!(X+m) ~ 


Table 10. Conjectural expected sample sizes (n or k) for the sequential test when H, is true 
and corresponding values for the fixed sample size test, when a = B = 0-05 


(a) One-way classification by groups 


















































6=0°5 | é=1 é=2 
ee: | Se eee ti 5/3 See 
n | n n 
| | 
| k Sequential | Fixed | Sequential Fixed Sequential Fixed 
__ = * | 12 ee . ees Se phe 
| 
2 10 | 14 | 5 8 3 4-5 
3 8 11 4 6 | Ie 4 
4 6-7 | 9 | 3 5 | 2 4 
5 6 8 | 3 + 2 3 
6 4 7 3 4 | 2 3 
7 4 6 3 | 4 | | 
8 2-3 4 | 
9 | 2 | 3 | 
} | | 
(b) Randomized blocks 
| 8=0°5 é=1 | é=2 
§ - | co 
k k k 
[ | | 
n Sequential Fixed | Sequential | Fixed Sequential | Fixed 
| | | 
ere Re ee pee ree” Oe re | 
| | 
2 11 15 5-6 9 3 6 
3 8 12 4 7 2-3 4-5 
4 Y 10 4 6 2 4 
5 6 8-9 | 3 5 2 3 
6 5 8 3 4-5 2 3 
7 5 7 | 3 4 , 
8 4 6 2-3 4 : 
| 











Thus on the assumption that H) is true we have, for a = £ = 0-05: 
(i) For the one-way classification by groups, to solve for n the equation 


— gknd logy, ¢ + logy) M{3(kn — 1), 4(k—1); 4(k—1) d}+ —1-15, 


where in reaching this result nk/(nk—1) has been taken as unity. 








rue 














W. D. Ray 397 


(ii) For the case of randomized blocks, to solve for k the equation 
— tknd logy) e + logy) M{4(k(n —1)), $(n—1); 4nd} = —1-15. 


Such solutions are those tabulated in Table 10 (a) and (6). Examination of further terms in 
the Taylor expansion reveal that no wide discrepancy results from only considering its first 
term. 

Several sampling experiments have been carried out and give average sample sizes in 
very close agreement with those obtained by the above method. 

This conjectural formula of Bhate’s can also be used to give the expected sample size in 
a two-sided sequential t-test. This test is equivalent to a one-way classification by groups 
when k = 2. Arnold (1951) reports the sample sizes at which a decision was reached for each 
of 500 such tests both when the null hypothesis was true and when a particular alternative 
was true. The d of Arnold’s paper is, in our terminology, ,/(}0) and his v? corresponds to our 


The probabilities of error were taken to be « = £ = 0-05 and 6 = 1. 
The average sample number when the null hypothesis was true was reported as 10-03, 
while the value given by the conjectural formula was N = 2n = 10-0. 
On the assumption that H, is true we have to determine 
o(_G <x (SA)™ (¥ +m) 
ee ee | 2 loa So le 
(5a) on 2, al(X +m) 
for different values of X, Y. 
A more amenable form of this expression may be obtained thus: 
2 (gA)™ (Y +m) 


pape COE. caer L, 
Let S =e , alae) eA (Y —X)T' + 


where T = ay 3A 2 Sgad GAP 


XTX s)* ax 42) 7" 


3A ° 
Now (GA)x 7 = i (FA)Xeb (HA) = rx, say, 
0 


and a reduction formula for yx is 
Ux = (fA)X +e -(X-1) ys. 
Alternatively = ry = ef [(4A)— (Ayr tt r(r—1) (Ayr... (— Dr. 


et Y — X) 
H 2 S= = by +1. 

ence S (4A) xt 
A few calculations to determine the expected sample size when the alternative hypothesis 
H, is true gave results almost identical to those obtained when H, is true. 











398 Sequential analysis applied to certain experimental designs 


10. COMMENTS 


As far as is known there is no proof that tests of this type terminate with probability one. 
Recently, however, David and Kruskal (1956) have provided such a proof for the 
sequential t-test. His proof uses some asymptotic forms for large n of the numerator and 
denominator of the likelihood ratio. It may be therefore that the asymptotic forms (4, 5) 
of §6 may be helpful in proving an analogous result for the tests described in this paper. 

It is perhaps worth while noting that other designs in the analysis of variance may find 
the following tables applicable. For example, in a factorial experiment it may be required 
to know whether or not first to take two levels of a factor, then three or four until some 
hypothesis is accepted or rejected. So too in a hierarchal classification by groups a series 
of successive subgroups within main groups could be added until some postulated hypothesis 
was accepted or denied. 

The question of choosing the sequential test appropriate to fit a given situation is never 
altogether easy. In the present case if k (or n) is fixed by the nature of the problem, we 
have still a choice of values for the parameters a, # and é, and in making this choice we may 
well be influenced by the expected sample size.* The tables given in the present paper are, 
of course, far from complete as far as ~, # and é are concerned, but it is hoped that they will 
give enough information to make some application of theory to practice possible. 


I would like to thank Dr N. L. Johnson most gratefully for his advice and encouragement 
in the preparation of this paper and the Director General of the British Coal Utilization 
Research Association for permission to publish it. 


REFERENCES 


ARNOLD, K. J. (1951). Introduction to Tables to facilitate sequential t-tests. National Bureau of 
Standards, A.S.M. 7. 

BakeER, A. G. (1950). Properties of some tests in sequential analysis. Biometrika, 37, 334. 

BaRnarp, G. A. (1952). The frequency justification of certain sequential tests. Biometrika, 39, 144. 

Buate, D. H. (1954). Ph.D. Thesis, University of London. 

Cox, D. R. (1952). Sequential tests for composite hypotheses. Proc. Camb. Phil. Soc. 48, 290. 

Davin, H. T. & Kruskat, W. H. (1956). The WAGR sequential t-test reaches a decision with 
probability one. Ann. Math. Statist. 27, 797. 

Hoe, P. G. (1955). On a sequential test for the general linear hypothesis. Ann. Math. Statist. 26, 136. 

Jounson, N. L. (1953). Some notes on the application of sequential methods in the analysis of variance. 
Ann. Math. Statist. 24, 614. 

Patnaik, P. B. (1949). The non-central x* and F distributions and their applications. Biometrika, 
36, 202. 

Pearson, E. 8. & Hartiey, H. O. (1951). Charts of the power function for analysis of variance tests, 
derived from the non-central F’-distribution. Biometrika, 38, 112. 

Rusuton, 8. (1954). On the confluent hypergeometric function M(a, y; x). Sankhyd, 13, 369. 

Rusuton, S. & Lane, E. D. (1954). Tables of the confluent hypergeometric distribution. Sankhyd, 
13, 377. 


* Considerations that arise in the interpretation of 6 were discussed in another connexion by 
Pearson & Hartley (1951, pp. 125-9) when considering the use of charts for the power function of 
analysis of variance tests. 





| 



























































W. D. Ray 399 
one. 
the 
and 
(4, 5) 
er. Table 1. One-way classification by groups (true limits) 
find k=no. of groups; n=no. within a group; A=nké; 6=0-5; a=f=0-05. 
lired a 
some k=2 k=3 k=4 
eries —4 : , 
. | | | | a) | as 
hesis n | A G G n A ¢ | @ n i a eo 
| Aw | ae | _- 
ever oe | | | | 
4 | 4 — 5-390 5 75 | 0-037 | 1-319 ei 2 0-065 1-825 | 
, we 6 6 0-002 | 1-091 7 | 105 | -072 | 0-696 ® |i 110 | 0-770 
may 8 | 8 025 | 0-639 9 135 | -089 | -497 S 16 126 | -521 
| 10 | 10 040 | -471 11 | 165 | -099 | -400 10 | 20 | -181 | -411 
are, 12 12 050 | = +385 13 195 | -105 | -344 12 | 24 | -134 | -3650 
: will | | | | | | 
14 14 0-059 | 0-333 15 | 225 | 0-108 | 0-306 14 | 298 | 0-136 | 0-310 | 
16 16 066 | 300 17 | 265 | -110 |  -280 16 | 32 | +138 -282 | 
18 18 | -072 | 276 19 | 285 | -112 | -261 | 
2 ° 9 kK ‘ | 9AR j 
nent 20 20 076 | 258 21 31-5 | 115 245 | | 
tion 30 | 30 0-089 | 0-205 | | | | 
E15 Pee Semen. Eve vee Pe 2 ee eee ee 
k=5 k=6 k=7 
1u of Z | x .. 2h on y meaner 3 : Le ; 
| | A | G G n A | 6? a n A G G 
| | B 3s 
=. 3 | 75 | 0072 4-176 2 6 0-008 5  — 3 10-5 | 0-184 | 2-407 
5 | 125 | +142 | 0-927 4 | 12 | -168 | 1-272 5 17-5 | -211 | 0-784 
7 17-5 |  *155 -568 6 | 18 183 | 0-646 7 | 245 | -207 | -512 
with 9 | 22-5 | -158 -432 8 | 24 | +183 | -464 9 | 315 | +199 | -401 | 
11 27-5 | +159 360 10 | 30 | -180 | = -377 | 
136. | | \ | | | | 
ance. 13. | 325 | 0-159 | 0-317 12 | 36 | 0-176 | 0-327 | | 
15 | 37-5 | +167 | +287 | | 
| | 
rika, | | | | 
tests, 
chya, 
i 
n by 


mn. of 








400 Sequential analysis applied to certain experimental designs 


Table 2. One-way classification by groups. 
(Approximate adjusted values are printed in italics) 


k=no. of groups; n=no. within a group; A=nkd; 6=1-0; -=f=0-05. 
































| 
k=2 k=3 k=4 
ee eS a we are 
n 4 | #2 G n A G G We Bes SR 
cs ss ame h: La Lis c= ia 
4 8 | 0-052 2-390 3 | 9 | 0-124 4-760 2 o | ome | == 4 
5 10 | -082 1-380 4| 12 169 1-79 3 12 231 | 3-13 
6 | 12 | -105 1-016 5 15 “195 1-174 4 | 16 266 | 1-498 | 
7 | 4] -121 0-826 6 | 18 -210 0-902 5 | 20 276 | 1-040 
3 16 | +135 710 7 21 221 762 24 +284 0-838 
| | 
10 | 20 | 0-150 0-578 9 27 | 0-234 0-605 8 32 | 0-287 | 0-637 | 
12 24 -166 “504 ll | 33 +240 “522 10 40 286 | -540 
16 | 32 “187 424 13 | 39 1244 471 12 | 48 284 | +482 
20 40 199 383 15 45 ‘247 -436 16 64 | +280 -424 
| 20 30 | -277 391 
30 60 | 0-215 0-333 21 63 | 0-251 0-377 | 
60 | 120 235 300 | 93 +251 334 | 30 120 | o-271 | 0-342 | 
| | 153 “251 306 | 50 | 200 | -266 | -306 
k=5 k=6 k=7 
n A @ G n | A G G n A qa | @ 
3 15 | 0-331 2-469 2 12 | 0381 | 24-042 3 21 | 0-469 | 1-925 
4] ¢ +345 1-332 3 18 “405 2-133 4 28 | -447 1-175 
5 | 25 -340 0-973 4 24 398 1-237 5 35 | -420 0-888 
6 | 30 “335 792 5 30 “384 0-920 6 | 42 405 | 745 
7 | 35 | -831 ‘687 6 36 ‘373 763 7 | 49 387 | -645 
| | 
9 45 | 0-322 0-565 8 | 48 | 0-354 0-601 9 63 | 0-365 | 0-552 
il 55 | +814 -498 10 | 0 | +340 521 11 17 349 | +494 
13 65 “308 456 | 12 , 72 “330 ‘473 | 13 91 338 | 456 | 
15 75 | +3038 -428 14 | 84 +322 -440 15 105 | -328 | +430 
| | 
| | 
25 | 125 0-288 0-360 | 20 | 120 | 0-307 0-386 | 21 | 147 | 0315 | 0-384 
| | 
k=8 k=9 k=10 
— : : : ; one GEARE TES Yee’ 
n | A | G G n | A G | G n | A | G G 
| | 
mn wa | - a ee | - | | = 
2 | lv | 0-566 7-500 3 27 | 0-565 1-684 2 | 20 | 0-709 5-066 | 
3 | 2 | -518 1-792 4 | 36 | +510 1-085 4 | 40 ‘534 1-06 | 
4 | 32 | -479 1-120 5 | 45 470 0-845 6 | 60 “457 0-705. | 
5 | 40 | -448 0-865 8 | 80 | -415 575 | 
6 | 48 | 424 ‘730 7 63 | 0-421 0-635 10 | 100 | -389 506 | 
| 9 81 “392 -540 | 
8 64 | 0-391 0-585 ll 99 372 485 16 | 160 | 0-347 | O-414 | 
10 | 80 | -371 512 20 | 200 | -332 | -384 | 
16 | 128 336 414 | 15 | 135 | 0-347 | 0-425 | 
20 | 160 +324 -387 21 189 +325 -380 | 30 | 300 | 0-312 | 0-348 
31 | 279 306 345 | 40 | 400 | -300 | -399 
| 30 | 240 | 0-306 | 0348 | 41 | 369 296 328 | 
40 | 400 +288 -318 
! 
a ' | 














crn wt © 








| 
| 
| 














W. D. Ray 401 


Table 3. One-way classification by groups. 
(Approximute adjusted values are printed in italics) 


k=no. of groups; n=no. within a group; A=nké; 6=2-0; a=f=0-05. 






































k=2 k=3 k=4 
n A Gg G nm | A G G n | A G G 
H | | 
2 s | oi | — 3 | 18 0-458 | 3-361 2 | 16 | 0-612 | 99-175 
4 16 | -279 2-153 5 | 30 497 | 1-403 4 32 | -616 1-696 
6 24 | +346 1-232 7 42 ‘507 | 1-040 6 | 48 | -596 | 1-128 
8 32 -380 0-967 9 | 54 | -510 | 0-891 8 | 64 | -583 | 0-935 
10 | 40 | -401 | “842 11 | 66 | +512 “809 10 80 574 | +837 
| | | 
12 48 0-416 | 0-775 13 | 78 | 0-512 0-75 12 | 96 0-567 | 0-777 
16 64 434 | +700 1 | 90 | -511 72 16 | 128 | -557 -708 
20 80 -444 | +660 | 20 | 160 -550 | 669 
| 21 | 126 0-510 0-661 | | 
30 120 | 0-461 | 0-611 31 | 186 507 ‘611 30 | 240 0-540 0-619 
40 | 160 | -479 | -580 | 41 | 246 | -505 | -592 | 40 | 320 535 | -596 
50 200 | -474 572 51 | 306 504 | +578 50 | 400 532 | -582 
| | | 
k=5 k=6 
ay | Be-E 666 EE BS be 
n a G | G mn | A | G G 
| | | | 
: pay ey; 
3 30 | 0-747 | 2-445 2 | 2 | 0950 | — 
5 50 | 677 | 1-272 4 | 48 765 | 1-535 
7 70 | -637 | 0-994 e | % “685 | 1-081 
9 90 | -613 | -869 8 96 645 | 0-913 
ll 110 | -596 | -799 10 | 120 620 | +825 
} } | 
130 0-584 | 0-753 12 | 144 | 0-602 0-770 
5 | 150 575 | +720 16 | 192 | -578 706 
| 20 | 240 563 | 669 
21 210 0-556 0-662 | | 
31 310 540 | -617 30 | 360 | 0-542 | 0-621 
41 410 531 595 40 | 480 | -530 598 
51 510 +525 +582 50 | 600 523 583 


26 








Biom. 43 














402 Sequential analysis applied to certain experimental designs 


Table 4. Randomized blocks (true limits) 


k=no. of blocks; n=no. of varieties; A=nkéd; 6=0-5; a= f=0-05. 
























































n=2 n=3 n=4 
ae l = Peasy. ape an OF l 
k ig G ie ee G G k i-e | G 
| 
a a | 
5 5 — | 186-97 4 6 | 0-005 | 11-366 3 6 | ooo | — 
7 7 | 0-027 2-586 6 9 | -082 1-637 5 | 10 | -120 | 1-635 
9 9 | -061 1-403 imi wa 9-953 7 | 14 | +155 | 0-876 
1 | 11 086 1-009 10 | 15 137 706 9 | 18 | -168 | -684 
| 1 | 22 | 175 | -516 
13 | 13 | 0-105 0-814 12 18 | 0-148 | 0-580 | | 
15 | 15 | 120 -698 l 21 | -155 503 13 | 26 | 0-180 | 0-446 
17 | 17 |  -183 -620 16 | 24 | -160 “451 15 | 30 | -182 | -400 
| | 17 34 | +183 | -367 
27 | 27 | 0-170 | 0-448 22 | 33 0-174 0-365 19 | 38 | +184 | +343 
| | 
n=5 n=6 n=7 
qe "ay | ai | = 
ko | A | G G k A | G G k A | G G 
| | | | 
—— 2 |__| vat ~aieniceagll 
4 | 10 | 0-145 2-129 3 9 | 0-151 4-680 2 7 | 0-089 — 
6 | 15 | -181 0-914 5 | 18 211 1-061 4 | 14 | +236 1-419 
8 | 20 | +194 | -627 7 | 21 218 0-658 6 | 21 |  -286 0-731 
10 | 25 198 -499 9 | 27 216 505 8 | 28 | +284 -529 
| 11 | 33 | -212 “424 10 | 35 | -226 -433 
12 | 30 | 0-197 | 0-426 | 
14 | 35 | +196 “380 13 | 39 0-208 0-374 | 
16 | 40 194 | +347 | 
| | | } | 
n=8 
“ie pg iF oe 
k | A G G 
—— —— a 
. | 0-254 2-627 
5 | 20 |  -268 0-870 
7 | 28 | -254 573 
9 | 36 | -241 452 
ow 

















Table 5. Randomized blocks. (Approximate adjusted values are printed in italics) 


k=no. of blocks; n=no. of varieties; A=nkd; S=1-0; c= f=0-05. 









































n=2 n=3 n=4 
aa S Pca re righ ite Wenge les r. hi i ee 
k Io ewe ieee. k| Al @ | G k | yaa a 
| 
hy Pg BR, ae Brg oe ey Mee een ee 
3 6 | 0006 | — 2 6 | 0009 | — 3 | 12 | 0-293 6-32 
5 1@ | +147 | 5-383 4 12 | -236 | 3-774 5 | 20 | -359 1-5 
7 14 | -218 | 2-129 6 18 | -302 | 1-529 7 | 28 | -878 0-987 
9 18 | -269 | 1-470 8 24 331 1-071 9 | 36 | -375 795 
11 22 “304 | 1-188 10 30 -346 0-874 11 | 44 | +375 | 694 
| | | 
13 26 | 0-329 | 1-032 12 36 0-355 0-765 | | 
15 | 30 | -349 | 0-933 | | | 
| | | | 
n=5 n=6 n=7 
| 
iia e aon “ Mw oar 54 sy EA 
k A G G k A a ng k A a ee 
| 
| Seca ae ae Poe | ; es 7a re al = 
. 10 0-308 —_ 3 | 18 | 0-470 | 2-86 2 | 14 | 0-529 | 2-748 
| 4 | 20 -412 1-803 5 | 30 453 | 1-137 4/28 | -507 | 1-412 
|i 411 | 1-02 7 | 42 -430 | 0-812 6 | 42 | -463 | 0-878 
| 8 | 40 402 | O-780 9 | 54 412 | 675 | 
| | | | 1 
| 
| n=8 
| 
' pen 
pe ay G G 
| | | 
2h Wee - aa 
3 24 | 0-581 2-15 
| & | 40 |  -506 0-999 
| 7 | 56 -460 -740 
j | 
Table 6. Randomized blocks. (Approximate adjusted values are printed in italics) 
k=no. of blocks; n=no. of varieties; A=nkd; 6=2-0; a=f=0-05. 
| n=2 n=3 n=4 
| k A G G ze} al @ | @ kj) Al @ | G 
| | 
ee —| oa ~~ | - ————_—_—_—_——_;+-——_ 
| | | 
| 3 | 12 0-365 | — 2| 12 | 0488 | — 3 | 24 | 0-779 | 4-482 
|} 5 | 20 | -557 4-379 4 | 24 672 | 3-399 5 40 | +779 | 1:86 
7 28 | 655 | 2-519 6 | 3 | -716 | 19 7 56 ‘766 | 1:39 
e 36 “715 1:977 8 | 48 | -734 1-49 9 | 72 ‘754 | 1-19 
| ll 44 756 1-720 10 | 60 | -742 | 1-299 11 | 88 | +745 1-08 
| | | | | 
| 1 60 0-806 | 1-47 | | | 
| | | | | | 
n=5 n=6 
| | . 2 | » 
ki/ Al @ G kia G G 
Beach Ng nacelle sacar deldatoenge 
| 
2 | 20 | 0-920 | 29-864 3 36 0-978 2-904 
4 | 40 | -858 2-102 5 60 884 1-617 
6 | 60 -810 | 1-406 7 84 -830 1-189 
8 | 80 780 | 1-166 9 | 108 -800 | 1-040 
| 26-2 















































[ 404 ] 


LOGNORMAL APPROXIMATION TO PRODUCTS AND QUOTIENTS 


By 8S. R. BROADBENT 


The British Coal Utilization Research Association 


SUMMARY. Measurements have been made which are subject to error, and we are required to give 
limits to a combination of the values measured. The combinations considered here are products and 
quotients. Some exact results are available in simple cases, but otherwise an approximation is required. 
The lognormal distribution, which is asymptotically exact, is shown to give useful approximations when 
fitted by moments to the combination. This method of fitting is nearly optimum in a defined metric. 
Tables are given which make its application simple. 


1. INTRODUCTION 


The problem we shall consider is assessing the precision of certain combinations of measure- 
ments which have been made with a known distribution of error. Suppose the unbiased 
measurements x; (i=1,...,n) are made, that H[x;] = “,;, and that we are interested in a 
combination of the x; of the form 


Q = (%Xq...%j)/(jyy--- Lp) (L<j<n) 
(more generally, the observations x; may be raised to any power). 
We may be required to set fiducial limits to 


(MaMa + Mj) (Myst ++ Mn)> 

i.e. to combine the fiducial distributions of the “7; which we have attempted to measure, or 
to set probability limits to q, i.e. to combine the probability distributions of the x;. In the 
first case we assume the distributions of x; known, except for the means y;, and that the 
observations x; are given; in the second we assume the distributions of the x; completely 
known. The two problems are formally identical,* and we shall henceforward speak only 
of setting limits to q. 

To fix the ideas with a particular example of the first case, consider the efficiency E of 
a steam boiler determined by a single trial in which the heat supplied to and the heat 
obtained from the boiler are obtained from sample measurements. The efficiency may be 


calculated from E = HW(1-0)/{Qni—O'y;, 


where H is the heat in 1 lb. of steam, W the weight of water evaporated during the trial, 
C the air-dried moisture content of the coal, Q the weight of coal fired, / its air-dried calorific 
value and C’ its moisture content as fired. All these measurements are subject to errors whose 
distribution is known; the effect of these errors on the precision of # is required. _ 

In general, the error of a measurement has two possible sources: sampling technique and 
instrument. In this paper we shall for simplicity suppose the errors to be either normal or 
rectangular, of known coefficients of variation or half-range, and with small coefficients of 
variation, although the restriction to these distributions is not necessary. In many applica- 
tions, the distributions of the errors are known to be of one of these two forms, and their 
standard deviations or half-ranges can usually be found by simple investigations. In other 
applications the errors can only be guessed, and then the roughest approximations are 
appropriate. 


* Weassume that fiducial distributions can be combined in the same way as probability distributions, 
but see the discussion in Creasy (1954) on this point. 


ee ee 


a 








st 





ure- 
ased 
in a 





S. R. BRoADBENT 405 


It is impossible to list exhaustively all such combinations of errors. Some of the simpler 
combinations can be discussed individually; in the event of a large number of errors being 
combined we can use with confidence the asymptotic distribution. It is in the intermediate 
cases that approximations must be critically considered. The choice of a suitable family of 
approximating distributions will always be more of an art than a science. 

It is possible in a particular case to refine approximation to any required degree. Gram— 
Charlier or similar series may be used; the saddlepoint method given by Daniels (1954) 
also generates approximating functions. 1:owever, in this paper we are concerned with a 
working or first-order approximation only: all that some data and applications merit. 

It is well known that, even when v is small and the component distributions are far from 
normal, 2,+...+2, has a distribution close to normal. When the component variates are 
not added but multiplied and divided, the fundamental approximating distribution is the 
lognormal, as was pointed out, for example, by Shellard (1952). We consider the questions, 
how is this distribution best fitted and when may we use it with confidence? 


2. KNOWN RESULTS 


2-1. We now suppose the errors of measurement are independent, and we denote a 
normal variate by N and a rectangular variate by R. The quotient of the standard deviation 
and the mean of N, and the quotient of the half-range and mean of R are both denoted by 
1,5, etc., referring to the variates 2,,7.,... in this order. Thus R/N (a,=0-1, a,=0-05) 
denotes the quotient of a rectangular variate whose mean is ten times its half-range and an 
independent normal variate whose coefficient of variation is 5%. 

For q = N/N Geary’s approximation (1930) is used in all practical cases. This states that 
if Q is the quotient of the means of the numerator and denominator, 

(q—Q)|(aig? + 05Q?) 
is approximately normally distributed with zero mean and unit variance. This approxi- 
mation is very good up to values of ~, and «, as large as 0-25 (see Creasy, 1954). The exact 
percentage points of this distribution may also be calculated using existing tables as was 
shown by Fieller (1932). The product NV x N was discussed by Craig (1936) and Aroian (1947). 

Percentage points of R/N have been given by the author (1954). He has given also points 
of the quotient of a triangular and a normal variate. 

The distributions of R x R, R/R and so on are not difficult to calculate exactly. 

Distributions of products, quotients and powers of variates have been extensively 
studied, but not many results have been obtained that can be applied to practical problems. 


3. LOGNORMAL APPROXIMATIONS 


3-1. The distribution of 
= (®1Xq-.. %;)/(% 544 +++ Lp) 


tends to the lognormal as n -> oo under very general conditions. The lognormal distribution 
has been discussed by Finney (1941), Gaddum (1945) and Johnson (1949). 

The most general lognormal approximation to q is a variate z such that log (z—&) is 
normally distributed with mean y and variance o*. Here for simplicity we consider the 
choice of ~ and o only, being always taken as zero. A variate with this distribution is 
necessarily positive, while g may have negative values. When «,,%,... are small, the 
approximation by this lognormal distribution may nevertheless be satisfactory, since the 
probability attached to such negative values is very small. 








406 'Lognormal approximation to products and quotients 


3-2. The first method of choosing / and @ is to calculate the moments of log q and to set 
and o? equal to the first and second corrected momen:s. We call this the method of fitting 
by moments to log q, or the fit to log q; this method is used for estimation by Finney (1941). 
It is also intuitively attractive to choose that lognormal approximation whose mean and 
variance are equal to the mean and variance of g (see Wicksell, 1917). We call this the 
method of fitting by moments to q, or the fit to g. The two fits are in general different, although 
the difference tends to zero as » increases. The difference throws a doubt on the intuitive 
appeal of fitting by moments, and raises the question we consider below, by what criterion 
are we to choose our approximation ? 

Let a lognormal distribution have mean m and variance s?, and let the distribution of the 
logarithm of a variate from the lognormal have mean / and variance o?. hen it ic known 
that 


m = exp(“+0"/2), 

s? = {exp (2 + 0°)} {exp (0?) — sl a) 
ft = logm— Flog (1+ 8?/m?), 

ecg eie | 


For the fit to log g, we must find the mean and variance of log g, and set ~ and o? equal to 
Pag ty: ; ; 
them. The ¢ % point of this fit is exp (s+), (2) 
where the probability is ¢/100 that a standardized normal variate is jess than p,. 
For the fit to g, we must find the mean and variance of g, and set m and s? equal to them. 
The ¢ % point of this fit is 
(mexp [p,flog (1 + y*)}4])/(1 + 0), (3) 


where 100v = 100s/m, the coefficient of variation of qg. This ¢ % point is tabulated in Table 1 
for m = 1, 100v = 0(0-01) 15 and t= 1, 5, 95 and 99. To use this table it is necessary to find 
the mean and coefficient of variation of g, to enter the table with the appropriate v, and 
multiply the value in the table by m. The table was calculated from a series expansion of (3). 


3-3. To find the moments of qg or of logqg we require the moments of various powers 
(positive and negative) of x or of log, when x is normally and when z is rectangularly 
distributed. We write x = ~(1+ay), where y is either a standardized normal variate or is 
uniformly distributed between —1 and 1, and @ is small (less than 0-15). We begin by 
finding the moments of log(1+ay), not when y is normally distributed, but (since 
Pr{l+ay <0} =e is less than 10-1) when y is from the truncated distribution 


exp (— dy") dy/{(1—e) /(2m)} (y> —1/a), (4) 
which is for practical purposes indistinguishable from the normal. We avoid mathematical 


difficulties by taking this distribution as the parent. Alternatively, the treatment of con- 
vergence given by Derksen (1939) could be used. 


The moment-generating function of log (1+ ay) is H[(1+ay)*], and this may be written 


1 ne ; ie 
Jan) imal. (1 + ttay + it(it — 1) a*y? + ...)exp (— 4y")dy+ K, 
where | K | <é/(1—e). 
l (2) 
= r = 2 
Let J(a,7) = Tan) | yal OP (- wy 


= {1 I[(1/(202), {(r— 1) } 2 APH + 1]. 











an 


in 


nan nm kh hCUamlcllUrh CO 





» set 
ting 
41). 
and 
the 
ugh 
tive 
rion 


‘the 
own 








S. R. BROoADBENT 407 


ty 
Here I(y, p) is the Incomplete Gamma ratio } v? e-’ dv/T(p +1), and is nearly one when y 
J0 


is large and p small; we deduce from Pearson’s table (1922) that for a <0-15 and r<6, 
J(a,r)< 10~*, and we therefore neglect it in this region. 

The series above is uniformly convergent within the range of integration, so that term-by- 
term integration is permissible. We may replace the limits of integration by (—00,00) in 
the first four non-zero terms when « < 0-15, since the error so introduced is a factor of 
{1—2J(«,r)/(1—e)} for r=0, 2, 4 and 6. As K is negligible we obtain to sufficient accuracy 
the first four terms of the convergent series for the moment-generating function 

1 + a*it(it — 1)/2 + acAit(it — 1) (it — 2) (it — 3)/8 + a8it ... (it —5)/48, 


and hence the cumulants of log (1 + ay), 


fg = Kg = a? + 5at/2 + 3208/3 +..., 


hy =kKky= areaesttens. | 
Ky = —3at— 22a8-..., | 


(5) 
Kg = 20a°+.... 
The cumulants of log (1+ ay) when y is rectangularly distributed in (— 1, 1) are obtained 
in a similar way. We have 
My = Ky = —a?/6—a4/20—08/42—-..., 
Ms = Ky = 27/3 + Tx4/45 + 2908/3154 ..., 
Ks = — 2a4/15— 1513a8/3780-—..., 
Ky = — 204/15 — 16a9/315-.... 


(6) 


We next require the expectation of various powers of (1+ ay) when y is normally dis- 
tributed and when y is rectangularly distributed. These have been calculated for the trun- 
cated normal distribution (4) and the rectangular distribution; the arguments are similar 
to those given above anu are not repeated here. Finally, we have arranged the results 
commonly required in Table 2. These formulae, with (2) or Table 1, enable the percentage 
points of the two lognormal fits to q to be calculated. 

The power series above have been formed by identifying coefficients in the expansion of 
the moment-generating function. We have given only the first four terms of this expansion, 
for moderate a. For later terms or larger a, important corrections would have to be applied 
to the coefficients obtained by uncritically continuing the series. Indeed, Wicksell (1921) 
pointed out that some of the series in (5) and (6) may diverge if the later coefficients are 
uncritically formed. The simple rules for obtaining these coefficients apply only under the 
conditions stated. 

The results given may be extended to cases in which the variates are not independent. 
It is necessary only to find the moments for correlated variates in the same way as above; 
some moments are given by Haldane (1942). 

As an example, consider the lognormal approximations to g = N/N; by convention the 
coefficients of variation of numerator and denominator are respectively 100a, and 100a5. 


Using Table 2 we obtain Elq] = m = (1) (1 +02 + 3a4+...), 


and E[q?] = (1+ 2) (1+ 303+ 150$+...), 


i.e. Vig] = 8? = a2 + 02+ 3a203+ 8a$+.... 








408 


Given the mean m and standard deviation s, let v= 100s/m. The percentage point of the lognormal 


Lognormal approximation to products and quotients 





Table 1. Standardized lognormal percentage points 


distribution with this mean and standard deviation is the entry in the table, multiplied by m. 


Lower 1 % points (t= 1) 









































| | | | 
v 00 § O01 0-2 0-3 0-4 0-5 0-6 | 0-7 | 08 | 09 | A 
he ee ent | | 
| | | | | | 
9 | 10000 | 9977 | 9954 | 9930 | 9907 | 9884 | 9861 | 9838 | 9815 | 9792 | —23 
1 | 0-9770 | 9747 | 9724 | 9701 | 9679 | 9656 | 9633 | 9611 | 9588 | 9566  —23 
2 | 0-9544 | 9521 | 9499 | 9477 | 9454 | 9432 | 9410 | 9388 | 9366 | 9344 | —22 
3 | 0-9322 | 9300 | 9278 | 9256 | 9234 | 9213 | 9191 | 9169 | 9148 | 9126 —22 
4 | 09105 | 9083 , 9061 | 9040 | 9019 | 8997 | 8976 | 8955 | 8933 | 8912 -21 
5 | 0-8892 | 8870 | 8849 | 8828 | 8807 | 8787 | 8766 | 8745 | 8724 | 8703 | —21 
| | | | 
6 | 0-8683 | 8662 | 8641 | 8621 | 8600 | 8580 | 8559 | 8539 | 8519-| 8498 | —20 
7 | 0-8478 | 8458 | 8438 | 8418 | 8398 | 8378 | 8357 | 8338 | 8318 | 8298 | —20 
8 | 08278 | 8258 | 8238 8219 | 8199 | 8179 | 8160 | 8140 | si2I | 8101 | —20 
9 | 0-8082 | 8062 | 8043 | 8024 | 8004 | 7985 | 7966 | 7947 | 7928 | 7909 | —19 
10 | 0-7890 | 7871 | 7852 | 7833 7814 | 7795 | 7776 | 7758 | 7739 | 7720 | —19 
| | 
11 | 0-7702 | 7683 | 7665 | 7646 | 7628 | 7609 | 7591 | 7572 | 7554 | 7536 | —18 
12 | 0-7518 | 7500 | 7481 | 7462 | 7445 | 7427 | 7409 | 7391 | 7374 | 7356 | —18 
13 | 0-7338 | 7320 | 7302 | 7285 | 7267 | 7249 | 7232 | 7214 | 7197 | 7179 | —17 
14 | 0-7162 | 7144 | 7127 | 7110 | 7093 | 7075 | 7058 | 7041 | 7024 | 7007 | —17 
15 a | — | — ent) ae. Ds ie ee | —-}|/-—-|-|- 
Lower 5% points (¢=5) 
| | | 
v 0-0 | 0-1 | 0-2 0-3 0-4 | 05 0-6 +7 | 08 | 0-9 A 
| | | 
, a | | | 
0 | 1-0000 | 9983 | 9967 | 9951 | 9934 | 9918 | 9902 | 9885 9869 | 9853 | —16 
1 | 0-9836 | 9820 | 9804 | 9788 | 9771 | 9755 | 9739 | 9723 | 9707 | 9691 | —16 
2 | 0-9675 | 9658 | 9642 | 9626 | 9610 | 9594 | 9578 | 9562 | 9546 | 9530 | —16 
3 | 0-9514 | 9498 | 9482 | 9467 | 9451 | 9435 | 9419 | 9403 | 9388 | 9372 | —16 
4 | 0-9356 | 9340 | 9324 | 9309 | 9293 | 9278 | 9262 | 9246 | 9231 | 9215 | —16 
5 | 0-9200 | 9184 | 9168 | 9153 | 9137 | 9122 | 9106 | 9091 | 9075 | 9060 | —15 
| | | | | 
6 | 0-9045 | 9029 | 9014 | 8999 | 8983 | 8968 | 8953 | 8937 | 8922 | 8907 | —15 
7 | 0-8892 | 8877 | 8862 | 8846 | 8831 | 8816 | 8801 | 8786 | 8771 | 8756 | —15 
8 | 08741 | 8726 | 8711 | 8696 | 8681 | 8666 | 8651 | 8636 8622 | 8607  —15 
9 | 0-8592 8577 | 8562 | 8548 | 8533 | 8518 8503 | 8489 | 8474 | 8460 | —15 
10 | 0-8445 | 8430 | 8415 | 8401 | 8386 | 8372 | 8357 | 8343 | 8328 | 8314 —15 
| | 
11 | 0-8299 | 8285 | 8271 | 8256 | 8242 | 8228 | 8213 | 8199 | 8185 | 8170 | —14 
12 | 08156 | 8142 | 8128 | 8113 | 8099 8085 | 8071 | 8057 | 8043 | 8029 | -14 
13 | 0-8015 | 8001 | 7987 | 7973 | 7959 | 7945 | 7931 | 7917 | 7903 | 7889  —14 
14 | 0-7875 | 7861 | 7848 | 7834 | 7820 | 7806 | 7793 | 7779 | 7765 | 7751 | —14 
15 | oms| —|—|—|/—|—|—|—]—|-] = 
| | | 


























S. R. BRoADBENT 409 


— Table 1 (cont.) 


Upper 5% points (t= 95) 









































ae v | 00 01 02 03 | 04 0:5 0-6 0:7 0:8 0-9 A 
\ ; | Z 
— 0 | 10000 | 0016 0033 | 0049 «0066 «=| «0083: | «0099 = 0115. | :0132)s 0149 4:17 
03 1 | 10165 | 0182 | 0198 | 6215 0232 | 0249 | 0265 0282 | 0299 0316 = +17 
3 2 | 1-0332 | 0349 | 0366 | 0383 0400 | 0417 | 0433 | 0450 0467 | 0484 9 +17 
9 3 | 10501 | 0518 | 0535 | 0552 0569 | 0586 | 0603 | 0620 | 0637 | 0654 +417 
9 4 | 1-0671 | 0688 | 0705 0723 0740 | 0757 | 0774 | 0791 | 0809 | 0826 | +17 
7 5 | 1-0843 | 0860 | 0878 | 0895 0912 | 0930 | 0947 6964 | 0982 | 0999 | +17 
| | 
1 | | | 
6 | 11017 | 1034 | 1051 | 1069 1086 | 1104 | 1121 1139 | 1156 | 1174 | +17 
0 7 | 1-1191 | 1209 | 1226 | 1244 | 1262 | 1279 | 1297 | 1315 | 1332 | 1350 | +18 
0 8 | 11368 | 1385 | 1403 | 1421 | 1439 | 1456 | 1474 | 1492-1510 1528 +18 
0 9 | 11545 | 1563 | 1581 | 1599 | 1617 | 1635 | 1653 | 1671 | 1689 | 1707 | +18 
9 10 | 1-1725 | 1742 1760 | 1779 | 1797 | 1815 | 1833 | 1851 | 1869 | 1887 | +18 
9 | | | | 
11 | 1-1905 | 1923 | 1941 | 1959 | 1978 | 1996 | 2014 | 2032 | 2050 | 2069 | +18 
8 12 | 1-2087 | 2105 | 2123 | 2142 | 2160 | 2178 | 2196 | 2215 | 2233 | 2252 | +18 
8 13 | 1-2270 | 2288 | 2307 | 2325 | 2343 | 2362 | 2380 | 2399 | 2417 | 2436 | +18 
; 14 | 1-2454 | 2472 | 2491 | 2510 | 2528 | 2547 | 2565 | 2584 | 2602 | 2621 | +19 
7 15 | 1-2689 ; — — — — — ~- — — — — 
| | 
Upper 1% points (¢= 99) 
| | | | ae ir eae « 7 
J » | 00 | O1 | O2 | 03 04 | 05 | 06 0:7 0:8 0-9 A 
| | | 
—| =) 
So 0  1-0000 | 0023 | 0047 | 0070 | 0094 | O117 | O141 | 0164 0188 0211 | +23 
6 1 | 1-0235 | 0259 | 0282 | 0306 | 0330 | 0354 | 0378 | 0402 | 0426 0450 | +24 
6 2 | 10474 | 0498 | 0522 0547 | 0571 | 0595 | 0620 | 0644 | 0669 | 0693 +24 
6 3 | 10718 | 0743 | 0767 | 0792 | 0817 | 0841 | 0866 | 0891 0916 | 0941 | +25 
6 4 | 10966 | 0991 | 1016 | 1041 | 1067 | 1092 | 1117 | 1142 1168 1193 | +25 
6 | 5 | 1-1219 | 1244 | 1279 | 1296 | 1321 | 1347 | 1372 | 1398 | 1424 | 1450 | +26 
5 | | | 
6 | 11476 | 1502 | 1528 | 1554 | 1580 | 1606 | 1632 | 1659 | 1685 | 1711 | +26 
5 “7 | 11726 | 1764 | 1790 | 1817 | 1843 | 1870 | 1896 | 1923 | 1950 | 1977 | +27 
5 8 | 1-2004 | 2030 | 2057 | 2084 | 2111 | 2138 | 2165 | 2193 | 2220 | 2247 | +27 
5 9 | 1-2274 | 2302 | 2329 | 2356 | 2384 | 2411 | 2439 | 2466 | 2494 | 2522 | +28 
5 10 | 1-2549 | 2577 | 2605 | 2633 | 2661 | 2689 | 2716 | 2745 | 2773 | 2801 | +28 
5 
11 | 1-2829 2857 | 2885 | 2914 | 2942 | 2970 | 2999 | 3027 | 3056 | 3084 | +28 
4 | 12 | 1-3113 | 3141 | 3170 | 3199 | 3228 | 3257 | 3285 | 3314 | 3343 | 3372 | +29 
4 } | 13 13402 | 3431 3460 3489 | 3518 | 3547 | 3577 | 3606 | 3636 | 3665 | +29 
4 14 | 1-3695 | 3724 | 3753 | 3785 | 3813 | 3843 3872 | 3902 | 3932 | 3962 | +30 
4 15 | 1-3992 — a _ —- | — — — —_\i— _ 

















410 Lognormal approximation to products and quotients 
Similarly, Eflog q] = # = — (fa2+ Zot+...)+ (403+ fa$+...), 
and Vlog qg] = o? = (a2 + Sat+...)+(a3+3a$+...). 
Using these values of ~ and o? we obtain for the first two moments of the lognormal fitted 


l c 
to log q, mean: 1+o§+ batt $og+..., 


variance: a?+a2+3a{+3a%a2+5a$+.... 


Since these agree with m and s? to O(a?, «3), and te higher order when a, = @, the difference 
between the two fits will be small. 














Table 2 
| ree 
‘ | y is distributed normally with | y is distributed rectangularly 
mean zero and variance one_ | in the interval (—1, 1) 
E{(i+ay)'] | j | 1 —a?/8— 1504/128—... 1 —a?/24 —at/128—... 
1 1 1 
| 2 1+a? 1+a2/3 
4 | 1+ 6a? + 3a! 1+ 2a? + at/5 
—} 1+ 32/8 + 105a4/128+... 1+02/8+ 74/1284... 
-1 1+a?/+3at+... 1+a?/3+04/5+... 
| =i | 1+ 302+ l5at+... 1/(1—a?) 
| —4 1+ 10a?+ 105a4+... 1+ 10a?/3 + 7at+ 
: —- 
Eflog (1+ay)"] | —r(a?/2+ 304/44...) | —r(a?/6+a4/20+...) 
Vilog (1+ ay)"] vr? (a2 + 5at/2+...) | r?(a0?/3 + 7a4/45+...) 
| = | 








It is not possible to state how different the two will be in general. We compare below the 
percentage points given by the two fits with some exact values (Table 3). The comparisons 
are made for quite large coefficients of variation, and for distributions rather different from 
lognormal. The agreement is surprisingly good (it would be worse at more extreme percen- 
tage points). The two fits generally give points closer to each other than to the exact 
value, i.e. choice between the two methods does not appear to be important. 


3-4. We turn now to the normal approximation. Cramér (1951) has shown that if H is 
a function of the central moments of a multi-dimensional sample of size S, such that H 
and its first two derivatives with respect to these moments are continuous, then H is 
asymptotically normal. If H(m,, ...) is this function, its mean and variance are asymptotic- 


ally H(y,,...), and , Zz a : 2H)\ (2H 
fb 2(™, 5a) vet /t4,(M,M)) (5) (=) 7 


Now q = (2%... %;)/(%;,,--.%,) is in the form required by this theorem, with S = 1. 
It is sometimes said that q is approximately normally distributed, with mean 


(Hy fa +++ My) | (Mya +++ Mn) 


C 2 
and variance o (2) +.. +... + 219010 (2 oq 2) (22 ee) " 


02 
that is, with coefficient of variation 


LOOfaS +... FOF, + ee + yoy lg + on — 2g 544 Oj yy —- fh 








ted 








S. R. BROADBENT 411 


The mean and variance of q agree with these formulae to O(a?) and O(«*) respectively when 
the x; are normally distributed, and so do not differ greatly for small «. Now the normal 
and lognormal distributions do not differ greatly in their 1, 5, 95 and 99% points for small 
coefficients of variation. Ifthe lognormal approximation to q is good, the normal approxima- 
tion will also be good fer small x. This appears to, be a better justification for the normal 
approximation in such cases than reliance on the application of Cramér’s asymptotic 
theorem to a sample of size one. Brunt (1931) has derived this approximation by a Taylor 
expansion of g and the assumption of normality. 


4, GENERAL USE OF THE APPROXIMATION 


4-1. It is necessary to know how useful the lognormal approximation is in practice. 
Since the exact distributions for which we require approximate percentage points are not 
usually known, it is impossible to give exact results. Equally, in many problems it is as 
reasonable to suppose the variates are lognormally distributed as to consider them normally 
distributed, and then the question is trivial. 

In Table 3 some typical comparisons with simple distributions are given. It is intui- 
tively clear that the approximation will improve as the a decrease or as the number of 
component variates increases. If we are satisfied by the agreement indicated in Table 3, 
we may use the method with confidence in more complicated situations with smaller «. 

But as the number of variates increases, larger coefficients of variation than those given 
in Table 3 may be allowed without impairing the approximation. It is interesting to know 
how much larger the coefficients of variation may be. Some quantitative conclusions may 
be drawn from the cumulants of logg, where q is the product of n independent norinal 


variates: Q = %4 2g... 
pont; 


Suppose the constant /“,//,... /4,, has already been removed, so that 2; has mean one and 
variance «?: suppose also that the lognormal approximation to g has been deemed satis- 
factory. We now wish to approximate to q’ = qz,,,,, where 2,,,, has mean one and variance 
f*. The lognormal approximation to q’ will remain satisfactory for small /; for large ? the 
distribution of qg’ will be unduly affected by x,,,, and the lognormal approximation will no 
longer satisfy us. The problem is to determine conditions on /, in relation to the «;, which 
allow us to use with confidence a new lognormal approximation. These conditions cannot 
ensure that the new approximation is from every point of view as good as the old, for 
example, the exact and approximate cumulative distributions will not generally coincide 
at the points where they coincided before. 
It follows from (5) that the fit to log q supposes that 


w = (log q—x;,)/K3 
is approximately normally distributed with zero mean and unit variance, where 
K, = —A/2—3y/4—5p/2, 


and Ka = A+ 5/2 + 32/3. 
n n n 
Here A=Sai, t= and v= > af. 
i=1 i=1 i=1 


The third and fourth cumulants of u are approximately 
Kg = —3u/Ai and xk, = 20r/A2, 
and are of the order of 3a/n4 and 20a2/n respectively. 











412 Lognormal approximation to products and quotients 


For q’ similar relations hold, with 
N=A+P, w=et+fht and v=pr+f%. 
If the new third cumulant is less than or equal to x, in absolute value, and the new fourth 
cumulant less than or equal to x,, we have grounds for believing that the normal approxima- 


tion to logq’ and the lognormal approximation to q’ will be at least as satisfactory as the 
approximations to g. The condition on the third cumulant is approximately 


A+ fA) < WA+ BY. 


Table 3. Exact and lognormal approximation percentage points 























Distribution Tor R N R/R | R/N | NxN | N/N 
| iy deupd sag) aia 
| | | 
100a, ... 20 10 10 10 4 5 
| 2) Nepaee 
1000, ... te pe ” | 4 | 5 
| | 
1% Exact | 0-804 | 0-767 0-842 | 0-846 0-872* 0-848 
Fit to q | 0-760 0-789 0-827 | 0-837 | 0-875 | 0-848 
Fit to log q 0-758 0-786 0-827 | 0-836 0-877 | 0-848 
| ‘ = i = SS Sae pas ae SE 
| 
| 5% Exact 0-820 0-836 0-872 0882 | 0-908* | 0-890 
| Fit to q | 0-822 0-845 0-874 | 0-881 | 0-910 0-890 
Fit to log q | 0-820 0-842 0-874 | 0-881 0-910 0-890 
Pe aaa | i a 
| | | | 
95% | Exact | 1-180 1164 | 1-147 | 1-133 | 1-095* | 1-124 
Fit to q¢ | 1-200 1173, | 1-144 | 1-134 1-096 | 1-124 
Fittologg | 1203 | 1175 | 1-144 | 1-134 | 1-096 | 1-124 
| | | 
| | 
| | 
99% | Exact | 1196 | 1-233 1-188 1:185 | 1-135* | 1-180 
Fit to g | 1298 | 1-255 | 1-209 1194 | 1-139 | 1-179 
Fit tologqg | 1:303 | 1-259 | 1-210 | 1-195 1140 | 1-180 








* Exact points communicated by Prof. L. A. Aroian. 
We write b = f?/A and y = A?/; then the condition on 6 is 
b§y? — b? + (2y—3)b-3 <0. 


It follows from Cauchy’s inequality that 1<y<n. The condition is satisfied at b=0 and 


until y = {(6+ 18-/b. 
Similarly, the condition on the fourth cumulant is approximately 
A2(v + £8) < (A+ f?)? v. 
We write 5 = A3/v; it follows from Cauchy’s inequality that 1<é<n*. The condition 
becomes 6b?—b—2<0, 


and is satisfied at 6 = 0 and until 
b = {(1+8d)4 + 1}/26. 














nd 


jon 











S. R. BRoADBENT 413 


Similar regions may be defined when the 2; are raised to positive or negative powers, and 
for rectangular variates. When q consists of the product of independent rectangular variates 
we see from (6) that both x, and x, are O(a*), and hence that the conditions on both these 
cumulants result in an acceptable region similar to the first of the two defined above. 

4-2. Some practical conclusions are now drawn. 

All coefficients of variation equal. Suppose a, = ... = a, = a and 2,,,, has coefficient of 
variation £. Then the lognormal fit to the new product will be as good as the fit to the old 
if £ is less than thé value given in Table 4, n = 1 (1) 6; the asymptotic criterion is also shown. 





Table 4. Critical values of £ for a, =... =a, =a 
n Criterion Kk, | K, | 
1 1-460 l4la | 
2 1-38a 1-300 
3 1-32« 1-26a 
4 1-29a 1-240 
5 1-26a 1-24a 
6 1-23a 1-23a | 
0) 1-220 1-19% 








Table 5. Critical values of B, a; unequal 





| 
n | Qe Qs Criterion Kk, | K,4 
| 
| | 
2 a/2 --= 1-28a 1-29a 
2 a/5 = 1-40 *1-39a 
3 a/2 a/2 1-270 1-25a 
3 . a/5 a/5 1-2la 1-36a 











Coefficients of variation unequal. For various values of « and a, Table 5 gives the critical 
values of # in terms of a, = a. 

it is clear from T.oles 4 and 5 that when a new component is added to q and it is known 
that the lognormal approximation to q is satisfactory, the coefficient of variation of the 
new variate can be of the order of 1-25 times the largest coefficient of variation already 
present. When this is the case, we have some confidence that the new q’ will also have a 
satisfactory lognormal approximation. In this way we may extend the results of Table 3. 
We may, for example, conclude that it is likely that the lognormal approximations are 
satisfactory for RN/N, a, < 0-1, a, < 0-06 and a, < 0-05, and for RN/NN, a, < 0-1, a < 0-06, 
a, < 0-07 and a, < 0-05. 


5. THE DISTANCE BETWEEN DISTRIBUTIONS 


5-1. We have already pointed out that there are several methods of choosing a lognormal 
approximation to q, and we have given examples to show that the fit by moments to q gives 
good practical results. It is possible to give this method some theoretical justification, 
which of course rests on the criterion adopted for judging an approximation. 











414 Lognormal approximation to products and quotients 


Suppose that we are given the cumulative distribution function F(x), for which the 
first and second moments yw and o? exist. Suppose also the cumulative distribution G(x) 
has moments m and s?, and that we are to choose from some family that G(x) which most 
resembles F(x). For example, G(x) may be a well-tabulated distribution, and we are at 
liberty to choose the parameters m and s*. The manner in which G(x) is to resemble F(z) is 
of course critical, and the number of possible criteria is unbounded. We develop first two 
almost trivial criteria and then discuss two approaches to the distance between F(z) 
and G(x). 

5:2. In the first place, suppose we really require two percentage points of F(x); this is the 
type of problem we stated initially. Suppose two parameters of G(x) are available, and for 
particular values of these parameters the two percentage points of G(a) coincide with those 
of F(x). These are the values we should choose. For example, in fitting a normal distribution 
to the 5 and 95% of R/N, a, = 0-1, a, = 0-05, we have to solve 

m— 1:645s = 0-882, m+1-645s = 1-133, 
to obtain a unique solution. This method cannot be usefully applied, since if the points were 
known we would not require to fit G(x). 

Now suppose it is known that a good fit will be obtained by selecting a set of known points 
for agreement. For example, the author (1954) gives the 1 and 5% points of R/N; suppose 
the 24 % point is required. The lognormal distribution is a reasonable fit to this distribution; 
we therefore fit a normal distribution to the logarithms of the 1 and 5% points p, and p,. 


We solve m — 2-326s = logp,, m—1-645s = log p,. 
The 2} % point, p,.;, is then given by 
m— 1-960s = 0-462 log p, + 0-538 log p, = log p,.;. 
Calculations show that this interpolation is within the accuracy of the tables given. The 
method is also exact when applied to interpolation or extrapolation in Table 1 of this paper. 


5:3. The distance between two cumulative distribution functions at the value x is 
naturally defined as some monotonic increasing function, M(z), of | F(x)—G(x)|. The 
distance between the two functions may then be defined by either 


{ * M(| F(e)— Ga) |) HF @)} dF (a), 
or sup [M( | F(x) — G(x) |) W{F(a)}), 


where y(z) is some weighting function. The first definition is associated with Cramér (1928), 
von Mises (1931) and Smirnov (1936); the second with Kolmogorov (1933). 

We are concerned with distances along the x-axis rather than perpendicular to it. That 
is, we aim to fit approximate percentage points which are close to the true points rather 
than to fix points at which F(z) is close to the desired value. 

Suppose that F(x) = p and G(x) = p have respectively the unique inverses x = ¢(p) and 
a = y(p) for almost all p. We are concerned with a distance between the functions at p of 
M(| ¢(p)—y(p)| ), and we require definitions of distance between the distributions in the 


form 1 
| _M(\ $()—(v)|) Ww) dp, 


or sup [M(| 6(p)—y(p) |) ¥(p)]. 
p 








di 


WwW 





the 
F(a) 
nost 
e at 
v) is 
two 
F(x) 


the 
for 
10se 
tion 


vere 


ints 
ose 
ion; 
Ds. 


The 
per. 


ry is 
The 


28), 


hat 
her 


and 
p of 
the 








S. R. BRoADBENT 415 


We do not here consider the second definition, which gives an unbounded distance for 
certain very simple cases, e.g. if F(x) and G(x) are normal distributions differing only in 
variance, M(z) = z and y(p) = 1. 

A natural form of the first definition for our purpose is given by M(z) = 2?, and y(p) the 
characteristic function of some set Z contained in the interval (0, 1) in which we are specially 
interested. For example, H might be the interval (0, 1) itself, or the union of the intervals 
(0-001, 0-05) and (0-95, 0-995). 

We therefore define the distance 0 between F(} and G(x) by 


6 =| (0) —y(p)Pdp|| £|. 


The standardized form of x corresponding to F(x) is (x —)/o; let £(p) be the inverse of the 
distribution function corresponding to the standardized form and 9(p) the similar function 
derived from G(x): 


o(p)="¢+o8(p), y(p) = m+sy(p). (7) 
It follows that 


|B |0 = (4—m)?+20(u—m) | _Elp)dp—2s(—m) | mp)dp+ | _{o&(p)—su(p)}*dp. (8) 


Consider the particular case | Z| = 1, that is, Z consists of all points in (0,1) except 
possibly for a set of measure zero. For instance, we would exclude values of p for which 
£(p) and 9(p) are not unique. Here 


0 = (u—m)? + 0? — os! + 8?, 
1 
— t% | E(p)n(p) dp. 
0 


Now if G(x) differs from F(x) in mean and variance at most, J= 1, and 
0 = (u—m)?+(o—8)?*. 


This is reduced to zero by choosing m = 4 and s = a. 

In all other cases J < 1. To minimize 0 we should choose m = uw and s = Ia, when @ takes 
the value o?(1—J?). If more than the mean and variance can be chosen we should make 
our choice to maximize J. 

By this criterion the method of fitting G(x) by moments is correct in that it sets m = y, 
but is not optimum in taking s = 7. The method gives 0 = 207(1—J). The optimum pro- 
cedure is to calculate J and then to choose s less than o accordingly. However, the method 
of moments will not be far from optimum if J is near one, i.e. if the standardized forms of 
F(x) and G(x) are not too different. Suppose J = 1 —e, where ¢ is small. By the optimum 
procedure 0 = 20% — oe”, and by the method of moments 0 is only slightly greater: 0 = 207e. 

It appears that ¢ is often small, and in this sense the method of fitting by moments is 
nearly optimum. J cannot usually be calculated in the cases for which we require approxi- 
mations. But in fitting a normal distribution to ,/(2x?), where the x?-distribution has 30 d.f., 
we obtain J = 0-995. In fitting a normal distribution even to R/R, a, = 0-1, a, = 0-05, we 
obtain I as large as 0-92. 











416 Lognormal approximation to products and quotients 


The more general case, | Z| <1, will not be discussed here except to point out that the 
procedure of minimizing @ is very similar. If we write (8) as 


| Z| 0 = (u—m)? + 2(u¢—m) (Uo — Vs)+07X —208Y +8°Z, 
we see we must choose m = ¢+(VY—UZ)a/(V?—Z), 
s = (UV-Y)o|(V2—2), 


provided G() is non-singular. 
If, for example, F(x), G(~) and E are symmetrical about their means, (9) reduces to 


m= and s= Yo/Z. 


5-4. Although the preceding section may be regarded as a refinement and in part a 
justification of the method of fitting by moments, it cannot be applied as it stands to the 
lognormal approximation to q. For the lognormal, and for a wide class of distributions, the 
standard form 7(p) of § 5-3 does not exist. It exists only if the family G(x) is linear. The 
product and quotient of standard forms are not a linear family, nor can they be transformed 
into a linear family. This is evident from the fact that such a product, say (a + bx) (c+ dy), 
or quotient has three parameters (ac, b/a and d/c), while a linear family has only two. 

Although we cannot fit a best lognormal to a general product or quotient of variates, the 
argument above still shows that the method of fitting by moments to q is nearly optimum. 
The definitions of £(p) and 7(p) must now follow from (7). 

The method of fitting to log q is nearly optimum in another metric, in which the distance 
between F(x) and G(z) is 


1 
[ [log {A(p)/y(p)}? dp. 


6. CONCLUSIONS 


We return to the problem of setting limits to 
i q = (2 1%q... %5)/ (541 --- Lp). 

There is no singly answer to the question, what approximation should we use to calculate 
good limits? Wemust use judgement in selecting the known distribution used in the calcula- 
tion: normal, lognormal, or one of those described in § 2. When the family is chosen, the 
method of fitting by moments to g is recommended. It is recommended only because the 
metric in which is is nearly optimum is the more natural; in practice there is little difference 
between the twa approximations. Given Tables 1 and 2, the fit to q is easily calculated; 
without Table 1; the fit to log q is simpler. 

If the number of component variates is larger than two, the lognormal approximation 
will give satay a results when the coefficients of variation of the components are small 
and not too different. It is therefore a suitable approximation for general use. If the 
coefficient of variation of q is sufficiently small, the normal approximation gives very similar 
results. 

To calculate tie first two moments of q it is necessary to know the coefficient of variation, 
or quotient of hadt-range and mean, of each component, and then to combine the values given 
in Table 2. The percentage points are then given in Table 1, interpolation and extrapolation 
for other tiie wa points being discussed in § 5-2. Finally, these points are to be multiplied 


b 
y H (Hy /42 vee My) (Mj ooo fbn) 








the 


nce 


ate 
ila- 
the 
the 
nee 


ed; 


ion 
all 
the 
lar 


on, 
yen 
ion 


ied 





S. R. BRoaDBENT 417 


The example we gave in § 1 may now be completed. Suppose, for example, H = NRN/NN 
(i.e. W is measured with rectangular error, the other components with normal error) and 
a, = 0-02, a, = 0-01, a, = 0-005, a, = 0-015, a, = 0-005 and a, = 0-005. We find the mean 
and coefficient of variation of HZ are 1-000 275 and 2-57 %. 

The | and 99 % limits to H are 0-941 and 1-061 times its calculated value, i.e. the efficiency 
has been determined plus or minus about 6 %. The normal approximation gives very similar 
limits: 0-937 and 1-063. 

If, however, «, were large in relation to «,, x5, etc., we might prefer to use R/N rather than 
the lognormal as an approximation, and to obtain percentage points from Broadbent (1954). 


The author thanks Prof. G. A. Barnard for his encouragment and assistance, Prof. 
E. S. Pearson for his help in preparing the work for publication, and the Director General 
of the British Coal Utilization Research Association for permission to publish it. 


REFERENCES 


ARoIAN, L. A. (1947). The probability function of the product of two normally distributed variables. 
Ann. Math. Statist. 18, 265. 

BroapBENT, S. R. (1954). The quotient of a rectangular or triangular and a general variate. Bio- 
metrika, 41, 330. 

Brunt, D. (1931). The Combination of Observations, 2nd ed. Cambridge University Press. 

Crara, C. C. (1936). On the frequency function of zy. Ann. Math. Statist. 7, 1. 

Cramer, H. (1928). On the composition of elementary errors. Skand. AktuarTidskr. 11, 13. 

Crambr, H. (1951). Mathematical Methods of Statistics. Princeton University Press. 

Creasy, M. A. (1954). Limits for the ratio of means. J. R. Statist. Soc. B, 16, 186. 

DantE:s, H. E. (1954). Saddlepoint approximations in statistics. Ann. Math. Statist. 25, 631. 

DERKSEN, J. B. D. (1939). On some infinite series introduced by Tschuprow. Ann. Math. Statist. 
10, 380. 

Freier, E. C. (1932). The distribution of the index in a normal bivariate population. Biometrika, 
24, 428. 

Fryney, D. J. (1941). On the distribution of a variate whose logarithm is normally distributed. 
J. R. Statist. Soc. Suppl. 7, 155. 

Gappuo, J. H. (1945). Log-normal distributions. Nature, Lond., 156, 463. 

Geary, R. C. (1930). The frequency distribution of the quotient of two normal variates. J. R. Statist. 
Soc. A, 93, 442. 

Hatpang, J. B. 8. (1942). Moments of the distributions of powers and products of normal variates. 
Biometrika, 32, 226. 

Jounson, N. L. (1949). Systems of frequency curves generated by methods of translation. Biometrika, 
36, 149. 

Kotmocorovy, A. (1933). Sulla determinazione empirica di una legge di distributione. G. Ist. ital. 
Attuari, 4, 83. 

Pearson, K. (1922). Tables of the Incomplete Gamma Function. Cambridge University Press. 

SHELLARD, G. D. (1952). Estimating the product of several random variables. J. Amer. Statist. 
Ass. 47, 216. 

Smirnov, N. (1936). Sur la distribution de w?. C.R. Acad. Sci., Paris, 202, 449. 

Von Miszs, R. (1931). Wahrscheinlichkeitsrechnung. Vienna: Deuticke. 

WicksELL, S. D. (1917). On the genetic theory of frequency. Ark. Mat. Astr. Fys. 12, no. 20. 

WicksE LL, S. D. (1921). An exact formula for spurious correlation. Metron, 1, 33. 


27 Biom. 43 











[ 418 ] 


! 
A REJECTION CRITERION BASED UPON THE RANGET 
; 


By C. I. BLISS, W. G. COCHRAN anp J. W. TUKEY{ 

The experimenter is occasionally faced with an inexplicable, ‘aberrant’ observation in an 
otherwise valid set of data. Ifit is defective and he accepts it—or if it is sound and he rejects 
it—his results will be biased. By following an‘objective rule rather than a subjective im- 
pression, he can control, and perhaps minimize, his risk of making a wrong decision. The 
rule proposed here is intended for data consisting of equal-sized sets of replicate measure- 
ments, the several sets possibly varying in their means but all being samples from popula- 
tions with the same variance. It was developed originally to meet the need for a simple 
rejection criterion for use with several of the bioassays in the U.S. Pharmacopeia X V (1955). 
Other applications could be cited. 

For the test which we propose, the range is computed from each set of n measurements, 
there being k sets in all. The largest range is divided by the sum of all the ranges. The 
resulting ratio T' is compared with the tabular value for the appropriate n and k at the prob- 
ability level of P = 0-05. If the observed ratio exceeds the tabular value (cf. Tables 1 
and 2), the set represented by this largest range is assumed to contain an aberrant observa- 
tion or outlier, which is identified by inspection and rejected. Thus the proposed criterion is 
closely related to Cochran’s test (1941) for the largest variance in a series. The application 
of this rule controls the probability of biasing a result by failing to reject an outlier. 

The test may be illustrated numerically with data from two biological assays that were 
submitted in collaborative studies sponsored by the U.S. Pharmacopeia. The first is an 
assay of corticotropin from the concentration of ascorbic acid in the adrenal glands of 
hypophysectomized rats. Seven rats were assigned at random to each of six dosage groups. 
Each group, in turn, was injected with one of three dosages of the standard preparation 
(S, to S,) and of the test or unknown preparation (U, to U;). The response y was determined 
separately from the adrenal glands of each rat. The total for each treatment group is given 
below, together with its range and sum of squares (6s?) : 





Dose a ee Se oe See ie mr aad 








| 
| | 
Total response Ly | 28-92 | 24-87 | 20-62 | 26-95 | 25°17 20-22 
Range (n=7) | 086 | 0-79 0°37 149 | 0-80 0-76 
Variance x 6, 6s? | 


0-442 | 0-362 0-117 | 1-642 | 0-455 0-467 





The range for U, (1-49) is the largest and most suspect; when it is divided by the sum of 
the ranges, we obtain 7’ = 1-49/5-07 = 0-294. This exceeds the upper 5 % point of 7’ = 0-288 
for k = 6 and n = 7, indicating an outlier. The identification of this group as unusual by 
our proposed range criterion may be checked against Cochran’s test; its critical ratio of 


+ Prepared, in part, in connexion with research sponsored by the Office of Naval Research. 
} From The Connecticut Agricultural Experiment Station and Yale University, Johns Hopkins 
University, and Princeton University, respectively. 





nl 
Vi 








‘kins 





C. I. Buiss, W. G. CocHran ann J. W. TuKEY 419 


1-642/3-485 = 0-471 exceeds the upper 1 % point (0-461) in the table by Eisenhart, Hastay 
& Wallis (1947). Among the individual y’s in this group (2-68, 3-90, 4-00, 4-02, 4-06, 4-12 and 
4-17), the smallest response not only falls considerably short of the others, but also con- 
tributes to a suspiciously small total response for its dose, when compared with the totals 
for the other groups. 

Our second example is from a turbidimetric assay of vitamin B,,. Four sets of triplicate 
tubes were prepared for each of six dosage levels of the reference standard. One set was 
placed in each of four tube racks, together with triplicate tubes for four dosage levels of a 
sample or unknown in two of the four racks. Within each rack the tubes were intermingled 
at random. After incubation overnight, the percentage transmittance of each tube was read 
in a photometer. The ranges of the k = 32 sets of n = 3 tubes had the following distribution: 


Observed range of 3 0 1 2 3 4 5 6 7 8 13 
No. of sets 1 5 6 8 4 1 


bo 
w 
— 


From the total of the ranges (= 113), the observed ratio T’ = 13/113 = 0-115. Since k > 10, 
we enter Table 2 with (k+ 2) 7’ = 34 x 0-115 = 3-91, which rouch exceeds the value of 3-08, 
interpolated between 3-03 and 3-11 in the column for » = 3. On the basis of all 32 ranges, 
we attribute the range of 13 to an outlier. The three readings comprising this set show trans- 
mittances of 24, 34 and 37 %, from which the reading of 24 %% would be rejected as aberrant. 
The test ratio may now be recomputed with the largest of the 31 remaining ranges in the 
numerator to obtain T’ = 8/100 = 0-08. Since 33 x 0-08 = 2-64 is less than the interpolated 
value (3-07) for nm = 3 and k = 31, no other observation would be rejected from the series. 


DISTRIBUTION 


The critical values of the ratio 7 in Table 1 have been computed for k = 2 to 10 ranges, each 
determined from a set of n = 2 to 10 measurements. Table 2 gives critical values of (k + 2) T 
for 10 to 50 ranges. Together they should cover the situations that occur most frequently. 
The original variates are assumed to be distributed normally with a common standard 
deviation within groups. Since our criterion is homogeneous, we may take o = 1 without 
loss of generality. 

Let w,, Ws, ..., W;, be a set of k ranges from normal samples, each of size n and independently 
and identically distributed. Let w, be the largest w and W the sum of the w’s. Then 


Yes i cs i 
kW (W—w,)+, 
_ —el(W— we) Sea 
1+w,/(W—wy) 148.4’ 
Wy w; 





where Ss... = —————"— — = max ——_______+_______ ; 
Wy + Wet... +Wy— Wy Wy Hone FW $F Wig tee Fy 

Thus the distribution of 7;, is determined by that of S;,_,, which is the largest of k identically 

distributed quantities of the form 

= ~* 

Wet Wet... + Wy 





Ry-1 











420 Rejection criterion based upon the range 


The argument used in developing Cochran’s test (1941) for the largest variance can be 
applied to relate the percentage points of S,,_, to those of R,,_,;. The value of S,_, for P = 0-05 
falls between the values of R,_, for 

P,=0-05/k and P, = 1—(0-95)"*, 
Roughly, P, is 0-98P, for all k; for k = 2, the two values are P, = 0-025 and P, = 0-02532, 
and for k = 20, they are P, = 0-0025 and P, = 0-002558. Since investigation by one of us 
suggested that the desired value of R,_, was closer to P,, we have used the compromise 
probability P* = }(2P,+ P,), except where 7), > 4 (that is, S,_, > 1) when P, is exact. 

The calculation of critical levels of 7), for P = 0-05 is thus reduced to the calculation of 
critical values of R,_, for P = P*, from which 

Ry, p+ 
700 = TE Re 
Since the numerator and denominator of R,_, are independent, several approximations 
are available. 


Table 1. Upper 5°% points of T’, the ratio of the largest of k independent normul ranges, 
each of n observations, to the total of these ranges 




















| No. of observations n in each range | 
No. of | 
ranges | 
k | | | | 
i Mlle Oo | Bag | 6 Sd ei ‘ides Piel, 
mie Pee a | oe sea. 3B ¥ bi Secu sR doo 
| | | | 
2 0-962 | 0-862 | 0-803 0-764 0-736 0-717 | 0-702 0-691 | 0-682 | 
3 813 667 | 601 | +563 ‘539 | 521 507 -498 | 489 | 
4 | -681 | 538 | ‘479 | 446 | +425 -410 | -398 389 | +382 
5 581 | 451 398 | -369 | ‘351 | “338 328 320 | +314 | 
| | | | 
} j | | | 
6 0-508 | 0-389 0-342 | 0-316 | 0-300 | 0-288 | 0-280 0-273 | 0-267 | 
7 ‘451 | +342 300 | +278 263 253 245 -239 | -234 | 
8 ‘407 | +305 | +267 *248 234 225 | -218 213 | 208 
9 369 ‘276 | +241 | +224 211 203 -197 "192 | -188 | 
10 +339 | "253 | 220 | *204 | “193 185 | +179 “174 | = +172 | 





Table 2. Upper 5% points of (k +2) 7’, where T is the ratio of the largest of k independent 
normal ranges, each of n observations, to the total of these ranges 

















| 
No. of | No. of observations n in each range 
~_— enecens 
| l 
b | ep eh ee sg | ee “ane er a 
pe og | —— | ES © od A SE) A a Pe uid € mas 
} | | | 
10 | 4-06 | 3-04 | 2-65 | 2-44, | 2-30; 2-21 2-14, 2-09 2-05 
12 4-06 3-03 2-63, 2-42, | 2-29 | 2-20 2-13 2-075 2-04 
15 406 | 3-02 | 2-62, | 2-41, 2-28 2-18, 2-12 2-06; 2-02, 
20 | 413 | 303 | 262 | 2-41, 2:28 | 2-18, 2-11 2-05 2-01 
50 | 4-26 | 3-11 | 2-67 2-44 2-29 2-19 2-11 2-06 2-01 
| | 











Ss a oe 


a 


an be 
: 0-05 


2532, 
of us 
mise 


on of 


tions 


es, 








C. I. Buiss, W. G. Cocoran AnD J. W. TuKEY 42] 


APPROXIMATE P* VALUES OF R,,_, 


The critical values of R,_, were approximated by two methods discussed below. Most of 
the values in Table 1 were obtained by method A, and most of those in Table 2 by method B. 
The two agree moderately weli, and intermediate tabular values give some weight to both 
methods.t 

A range or total range is often approximated by a multiple of the square root of a y?- 
variate, i.e. by a y-variate. Equivalent degrees of freedom v and scale factors ¢ for a single 
range are given by Thomson (1953), and for mean ranges by David (1951). From these, the 
approximate distributions of w, and of w are 


: _ —- Wa FH Wat... + Wy . ; 
wWy~C,X,//r, and w= —- ~~ CqX9/4/Vo- 


When required, more accurate scale factors for mean or total ranges can be found from 
c? = d?. +V,/k (David, 1951). 
Method A consists of the straightforward approximation 


eae. Se er J 
WotWgt...+w, (k—-l)cox, (k-l)e,* ’ 
where ,/F = = J 2 and F has v, and v, degrees of freedom. The required values of ,/F for 
2A Vy 
critical levels of P* are obtained by interpolation in a table of F. 
Method B is less direct and more refined, the 7 approximation appearing only in the 


denominator of R,_,. Let 





‘. 


ay a att. Hy 


(k—1)¢, 
Then the expected mean square of wu is o”, and we can regard 


W, Wy 


Bia = toy tytn Fwy (E— NY equ 
as the result of studentizing w,/(k—1)c, with the scale estimator w. We know the result of 
studentizing a y-variate (at probability P*) with the scale-estimator based on v, degrees 
of freedom. In that case (F,,, 0) is converted into (F,,.»4)! and the P* value is increased in 
the ratio ./(F/F,,), where F is the P* value of F, ,, and F,, is the P* value of Ff, ,.. If we take 
this same studentizing factor as applicable to the studentization of w,/(k—1)c, at P* by u, 


we have, as the method B approximation, 


w F 
Pea Baad By 


where w is now the range of 7 unit normally distributed items at P* (Pearson, 1942). To 

improve the accuracy of linear interpolation for non-tabular values of v,, v, and P*, selected 

values of ,/(F/F.,) were first computed from the five-figure tables of Merrington & Thompson 

(1943), supplemented by the Hald tables (1952) for larger values of v, and for P = 0-001. 
The two approximations differ by the factor 


a es 
VF. Xala/Pa’ 


+ It is not claimed that the entries in the tables are correct to the last figure given, but we believe 
that most of them will not differ from the true values by more than a few units in the last place. 











422 Rejection criterion based upon the range 


which would be unity if the y-variate approximation to the numerator range were perfect. 
Method B has the advantage of allowing for idiosyncrasies in the numerator range, rather 
than hiding them in a y-variate approximation. For k > 2, the total range in the denomi- 
nator is approximated more closely by the y-variate than is the simple range in the 
numerator.t 


THE CASE OF n=2 


For ranges of 2, w, = /2 | x |, where x is a unit normal deviate. Methods A and B are then 
identical, since w, is exactly a y-variate. The desired critical values are related directly 
to those of Lord’s (1947) statistic 

d, | x | - (k—1)d,|x| 


~1,8) = 
—t8 Ww We + Wt... + Wy’ 





where @ is the mean of k—1 ranges, and d, = 1-1284 is the average range of a sample of 
n = 2. The critical values of R,_, are thus ,/2/d,(k—1) times those of u(k—1, 2). Despite 
rather substantial interpolation for P* at some values of k, we have used Lord’s tables in 
computing the values for n = 2 in Tables 1 and 2. 


REFERENCES 


Cocuran, W. G. (1941). The distribution of the largest of a set of estimated variances as a fraction of 
their total. Ann. Eugen., Lond., 11, 47. 

Davin, H. A. (1951). Further applications of range to the analysis of variance. Biometrika, 38, 393. 

EIsENHART, C., Hastay, M. W. & Waxuis, W. A. (1947). Selected Techniques of Statistical Analysis. 
New York: McGraw-Hill Book Co. 

Hap, A. (1952). Statistical Tables and Formulas. New York: John Wiley and Sons. 

Lorp, E. (1947). The use of range in place of standard deviation in the t-test. Biometrika, 34, 41. 

Merrineton, M. & THompson, C. M. (1943). Tables of percentage points of the inverted beta (F) 
distribution. Biometrika, 33, 73. 

PEARSON, E. 8. (1942). The probability distribution of the range in samples of n observations from 
a normal population. Biometrika, 32, 301. 

TuHomson, G. M. (1953). Scale factors and degrees of freedom for small sample sizes for y-approxima- 
tion to the range. Biometrika, 40, 449. 

Tuxey, J. W. (1956). Every man his own studentizer. Memorandum Report 58. Statistical Res. 
Grp. Princeton University. 

United States Pharmacopeia XV (1955). Easton, Pa.: Mack Publishing Co. 


+ Since the completion of the present calculations method B has been studied further by one of the 
authors (Tukey, 1956). This investigation indicates that, as used here, method B should give slight, 
but only slight, overestimates of the critical values. 





— 





rfect. 
ather 
10mMi- 
1 the 


then 
ectly 


le of 
spite 
es in 


on of 

393. 
lysis. 
1. 
 (F) 
from 
ima- 


Res. 


. the 
ght, 








ig 


[ 423 ] 


CONFIDENCE INTERVALS FOR A PROPORTION 


By EDWIN L. CROW* 
U.S. Naval Ordnance Test Station, China Lake, California 


1. SUMMARY AND DEFINITIONS 


Tables of confidence intervals for a proportion based on the sample proportion are presented, 
calculated by a slight modification of the method proposed by Sterne (1954), for fixed sample 
sizes up to 30 and confidence coefficients of 0-90, 0-95 and 0-99. This system is compared, 
especially in shortness, with Sterne’s system, Clopper & Pearson’s (1934), and another, 
intermediate system. It is assumed that a random sample of fixed size n is drawn from an 
infinite population containing a proportion 7 of individuals with a given characteristic, 
that r individuals are observed to have the characteristic, and that it is desired to estimate 
m by means of a two-sided confidence interval. 

While relatively tedious to calculate, Sterne’s system and its modification have in 
common the advantage that no system is possible which has shorter total length of the 
n+1 confidence intervals for r = 0,1,...,n. However, they are at-a disadvantage if one- 
sided confidence intervals and tests are also of interest. 

Neyman’s definition of a confidence interval d(r) (or system of confidence intervals), 
modified for the case of a single, discrete variable r and one parameter 7, consists of the 
requirement that, whatever be 7, the random interval d(r) cover the true value of 7 with a 
probability at least equal to a prescribed number called the confidence coefficient, say 1—e. 
Neyman showed that construction of a system 46(r) is equivalent to the determination, for 
each 7, of regions of acceptance A(z) such that: 

(i) P{re A(m)|m}>1—-e. 
(ii) Every r is included in at least one A(z). 

(iii) The set of values of 7 whose regions A(z) contain r is a closed interval. 

This interval is the confidence interval for 7 to be used when the value r is observed. 

In the present case of binomial sampling r takes only the integral values 0, 1, 2,...,n, 
and the A(z) will be taken as successive integers r, 7; <r <7, say, such that 
n! 


— n*(1—m)"—,. (1) 


T. 
p> Pn,r(7) 21-e, Pn,r(7) = r! (n—r)! 


r=" 
The end-points 7, and.r, are not uniquely determined by (1). Four different further restric- 
tions which do render them unique are considered: 

(1) The Clopper—Pearson system, 4,(7) say, is determined by choosing ‘central’ accept- 
ance regions A,(7), so that r, is the largest r and r, the smallest r with each tail probability 
not more than 4e. 

(2) The system d,(r) is determined by choosing acceptance regions A,(7) such that 


n 


si 0, 2 Pryelt) <€ (0<7<7y9); (2) 
A,(m)=A,(m) (to<7<}), (3) 


* Now at Boulder Laboratories of the National Bureau of Standards, Boulder, Colorado 














424 Confidence intervals for a proportion 


where 7) is the Clopper—Pearson upper confidence limit for r=0, determined from 
(1—7 yo)” = 4¢, and by choosing A,(7) for 7 > } as the symmetrical regions consisting of the 
integers n—r’, where r’ is in A,(7’), 7’ = 1—7. 

(3) The Sterne system 6,(r) is determined by choosing acceptance regions A,(77) as those 
values of r with the largest probabilities of occurring, i.e. the r’s are chosen in order, starting 
with the most probable and continuing in both directions from it until (1) is satisfied. If 
two values of r have equa! probabilities and both cannot be excluded from the acceptance 
region, then both are included. (The latter provision yields a larger acceptance region than 
necessary at a finite number of values of 7, but it makes the confidence intervals closed 
without increasing their total length.) 





1 


| 
| 
ot | 
| 
| 


O—x* 


0-8F- 


Le—o- 


0-7; 


0-6 





—O— «x ——--- 


7 0:5-- 


—x-o 








a ——}———}—— as —$— 0-275 


————| ) -—— 0-232 

















x-O 


























2 3 a 6 7 8 9 


x 6, Limits O 46, Limits 
@ 6, Limits where they differ from 6, 





Fig. 1. Confidence intervals 6, 6;,6, for n=9, ¢€=0-10. 


(4) The modified Sterne system 6,(7) is the same as 6,(r) except for the limits determined 
by the operation called substitution in § 2. We show in § 3 that the Sterne system has the least 
total length because A,(7) is minimum for each 7, except for a set of zero measure. But this 
minimum property still holds if the substitution of one value of r for another occurs at the 
smallest value of 7 (< 4) consistent with (i) rather than at the value of 7 where the two ’s 
become equally probable, subject to the restriction that no r’s other than the two of the 
substitution become involved. For example, in the case illustrated in Fig. 1 (to be discussed 
later) substitution of r = 0 forr = 5 may occur as early as 7 = 0-232 rather than at the value 
m = 0-275, where py (7) = py ;(77). Since a substitution at 7 < } simultaneously determines 
an upper confidence limit for a small value of r and a lower limit for a value nearer 4n, the 
earlier substitution has the advantage of transferring a given subinterval to a larger r 








rom 
the 


lose 
‘ing 
. If 
nee 
han 
sed 


ned 
east 
this 
the 
0 #8 
the 
ssed 
alue 
ines 
the 
err 





- iia 


Epwin L. Crow 425 


where it is relatively less important; this is the sole advantage of 6, and 63. Thus in Fig. 1 
the subinterval (0-232, 0-275) is transferred from r = 0 to r = 5, so that the 90 % confidence 
interval 6, for r = 0 is (0, 0-232) and that for r = 5 is (0-232, 0-790). However, the natural 
irregularity of 3, is augmented sufficiently by the above definition of 3, to cause the 90% 
interval to be longer than the corresponding 95% interval in three independent cases for 
n< 30 (n=7, r=5; n=10, r=5; n=10, r=7), in one of which (n= 10, r=5) the latter in- 
terval is covered. (No 95 % interval for n < 30 is longer than the 99% interval.) Such in- 
teresting and theoretically permissible, but practically undesirable, results are prevented in 
6, and Table 1 by the additional restriction that no substitution is made before the point 
at which the length of an interval would equal that of a corresponding interval with larger 
commonly used confidence coefficient. 





0-1 


Probability 


0:05 

















r 03 0-4 0:5 
Fig. 2. Probability that the Sterne intervals 6, fail to cover the true 7 (n=9,¢€=0-10). 


The four systems are defined above by direct satisfaction of condition (i) on acceptance 
regions. It can be readily shown that conditions (ii) and (iii) are also satisfied by 6, and 44. 
However, condition (iii) is not always satisfied by 6, or 6, for a reason to be discussed near 
the end of § 2. At least five such cases occur in 6, and 6, for n< 30: e=0-10, n= 20, r=0, 
and r= 1; €=0-01, n= 27, r=2; €=0°10, n=28, r=2; €=0-01, n=28, r=0. In the first case, 
for example, the 6, confidence set for r=0 consists of the two intervals (0, 0-127) and 
(0-141, 0-147). In order to obtain confidence sets which are intervals in such cases also 
without increasing the size of the acceptance regions, the definition of A,(z) is altered by 
replacing its least probable value of r by the next less probable value if this can be done 
without violating condition (i) on A,(7). This has been possible in calculations of 6; and 6, 
for n < 30; e.g. in the above case the 6, and 6, intervals for r= 0 are (0, 0-127) and (0, 0-126) 
respectively by virtue of a confidence interval for r=6 of (0-141,0-500) rather than 
(0-147, 0-500). 

A similar definition of a confidence interval can be used for the parameter of other one- 
parameter discrete distributions. A table of 6, for the parameter of the Poisson distribution 
is being constructed. 











426 Confidence intervals for a proportion 


2. CALCULATION OF TABLES 


Because of the asymptotic normality of the binomial distribution, the differences among 
the four systems defined above are of interest only for small sample sizes. For the purpose 
of comparison all four systems were calculated to at least three decimal places for n < 20 
(and 6, for n < 30), although several tables provide 6, in this range as well as beyond (e.g. 
Hald, 1952). The values of 7 for which the tail sums of the binomial distribution exactly 
equal a specified probability are fixed percentage points of the Incomplete Beta-Function 
given by Thompson (1941) (repeated by Pearson & Hartley, 1954) and Clark (1953). They 
can also be obtained, accurate to about three decimal places, up to n = 150 by linear inter- 
polation in the tables prepared by the National Bureau of Standards (1949) and the U.S. 
Army Ordnance Corps (1952). By symmetry the acceptance regions need be determined 
only for 7 < }. The confidence limits for 6, and 6, are given respectively by 100¢/2 and a com- 
bination of 100e/2 and 100e percentage points of the Incomplete Beta-Function. 

In contrast, the confidence limits for 6; and 6, correspond to no fixed percentage points 
of the Incomplete Beta-Function and must be calculated by frequent reference to the 
individual terms as well as the sums of terms; these are both tabulated in the National 
Bureau of Standards Tables for 7 = 0-01 (0-01) 0-50. Beginning with 7 = 0, we determine the 
complements of the acceptance regions A,(7) (or A,(7)), that is, the values of r with the 
smallest probabilities p, ,(7) such that > p, ,(7) <e. For 7 sufficiently near zero all such r 

i 


are consecutive integers ending in n. Hence we enter the table of sums ana successively 


determine: : ‘ 
(1) m;, such that > p,,,(7) <€ in my = 0<7<77z;, 
r=1 


n 
(2) mz, such that > p, ,(7)<¢ in 7,;<7<7z., 
r=2 


up to the largest such value, say 77, ,_;, less than the 71), at which p,, 9(77) becomes sufficiently 
small to be included in the sums > p,, ,(7). The operation thus performed of transferring 
rT 


one value of r from the complement of A,(7) to A,(7) (no other value of r being involved) 
is called elimination. The point r=0 enters the complement of A,(7) by addition at the 
7 =T yo such that 

Pn,o(7) + 2 Pn, r(7) =6€ (4) 


if any such 7 exists, or if not, by substitution for r=k at the 7=7y)9=7,, such that 

Pn,o(7) = 2n,x(7). The calculation for 6, differs only in that the substitution of r= 0 for r=k 

occurs as soon after 7=77,, ;,_; as m 

Pn,o(7) 50 >» Pn,r(7) (5) 
r=k+1 

decreases to e. 

The calculations of 6, and 46, are illustrated in Figs. 1 and 2 for n=9, e=0-10; Fig. 1 also 
shows the corresponding Clopper—Pearson intervals 4,. There are eliminations at 7 values 
of 0-012, 0-061, 0-129, 0-210, 0-390 and 0-485 (lower confidence limits for the eliminated 
values of r), a substitution at 0-275 (where py 9= py 5) for d; or 0-232 for 6,, and an addition 
at 0-391. In Fig. 2 each curve is labelled with the values of r whose probability sum is 
plotted. As indicated by dashed extensions of the probability curves, the substitution could 
be made, with the same minimum lengi. of acceptance region, as late as 0-301 with py ; in 





~—----— - 





Long 
pose 
< 20 
(e.g. 
tly 
tion 
they 
iter- 
US. 
ined 
:0m- 


ints 
the 
onal 
> the 
| the 
ich r 


vely 


also 
alues 
ated 
‘ition 
1m is 
ould 
Ig, 5 in 





a ania 


Epwin L. Crow 427 


the sum or as early as 0-232 with py . in the sum. As shown in Fig. 1 the substitution simul- 
taneously determines 77, as the lower confidence limit for r=5 and its equal, ay», as the 
upper confidence limit for r= 0. 

The calculation of 6, or 6, may proceed from any value of 7 by determining: (1) the next 
value of 7, if any, at which addition is possible; (2) if not (as is usually the case), the next 
value of 7, if any, at which substitution is possible; and (3) if neither is possible, the next 
value of 7 at which elimination is necessary. This procedure assures that the shortest 
possible acceptance region is maintained for all values of 7. For example, if 7, is deter- 
mined by substitution, we determine next whether any 7 satisfies 


n 
Pn,ol®) + Pps) + ,> Pn,r(7) = 6. (6) 
r=k+1 


If so, the solution is 7y, by addition. If not, we determine whether p,, (7) decreases to 
Pn,k+1(7) before (5), which has a minimum, increases to ¢, and in the affirmative determine 
Ty. = Tz,x41 for d4 by substitution where (5) with p,, ;..,(7) replaced by p,, ,(77) decreases 
toe. In the negative 77, ;,,, is determined where (5) increases to e. We continue in this way 
up to 7=4, determining whether an addition, substitution, or elimination occurs next. 
The 6, or 6, confidence interval for 7 when r ‘successes’ are observed is then (77;,, 77,,). 

The reason for the occurrence in the initial definition of 6, or 6, of confidence sets con- 
taining two intervals may now be discussed. The probability sums such as in Fig. 2 that 
include two tails always have a smooth minimum which occurs at a value of 7 diiferent 
from that at which its largest term (from the left tail) becomes equal to the next largest 
term (from the right tail). All calculations support the conjecture that the latter value of 7 
is always larger (7 < 4). Then € may be just such a size that the two-sided sum increases 
from its minimum through ¢ where its largest term is still from the left tail and hence should 
be eliminated rather than a term on the right. This would give two intervals of 7 with A(z) 
containing the corresponding left-tail value ofr. For each two-sided sum there would always 
be such an ¢€, but for any particular ¢ it would occur only rarely and can evidently be avoided 
as specified in § 1. 

‘Table 1, comprising the 6, system of confidence intervals for n = 1 (1) 30 and confidence 
coefficients 0-90, 0-95 and 0-99, was calculated in the above manner using 2-point to 7-point 
Lagrangian interpolation in the National Bureau of Standards Tables. In addition, Robert S. 
Gardner checked the entire calculation independently on a high-speed electronic cal- 
culator using automatic coding. The total time with the latter was about 50 hr. spread over 
two or three weeks, while desk calculation required about 200 hr. spread over many months. 
All differences (mainly of 1 in the third decimal! place due to inadequacy of linear inter- 
polation) were reconciled, so that Table 1 should have no error. The tables of Thompson and 
Clark were used for checking the limits that arise with all the probability ¢ in one tail. 


3. COMPARISON OF SYSTEMS OF CONFIDENCE INTERVALS 


The system 6, is shorter than 6, under any conceivable definition of ‘shortness’ because all 
its intervals are contained in those of 6, and some of 6, contain points outside the corre- 
sponding intervals of 6,. On the other hand, though the 6, and 6, intervals more often than 
not contain the corresponding 6, intervals, there are fourteen intervals out of the total of 
690 6, intervals calculated for n < 20 whicl contain the corresponding intervals of both 6, 
and 6,, and 18 (63) are longer than the corresponding intervals 6,(d.). 








428 





Confidence intervals for a proportion 


Table 1. Table of confidence limits for a proportion 


Calculated by Edwin L. Crow, Eleanor G. Crow and Robert S. Gardner according to a modification of a 
proposal of Theodore E. Sterne. 





| Confidence coefficient (%) 














n fr 
| 
1 0) 
1 
(2 0 
1 
2 
| | 
‘ ae 
3 0 
1 
2 
3 
4 0 
1 
2 
3 
4 
| | 
Prete 
| 5 0 
| 1 
2 
3 
4 
| 5 
= 
| | 
i on 
| 1 
2 
3 
4 
5 
6 
7 0 
‘ 
2) 
3 
4) 
5 | 
6 | 
7 





0-000 
100 


0-000 


95 


0-000 
-050 


0-000 
‘007 
053 
-129 
*225+ 
“341 
-446 
623 


99 


0-000 
“010 


0-000 
005+ 
100 


0-000 
-003 
-059 
+2157 


0-000 
-003 


0-000 
“002 
033 
*106 


999 


~-- 


+398 


0-000 
“002 
027 
-085- 
‘173 


“294 
464 


0-000 
‘001 
-023 
‘071 
*142 
236 
“B57 
-500 


10 














Confidence coefficient (%) 


> 


“000 
“012 
061 
129 
210 


+232 
*390 
*485- 
-609 
‘768 


95 


0-000 
-006 
046 
‘lll 
*193 


0-000 


006 
-041 
‘098 
169 


+251 
+289 
-442 
+557 


‘711 


0-000 


“005+ 
037 
‘087 
*150 


+222 
*267 
381 
‘397 
603 


‘733 


0-000 
-005- 
‘033 
‘079 
“135+ 


-200 
+250 
+333 


0-000 
‘001 
-020 
‘061 
121 


*198 
-293 
-410 
*549 


0-000 


‘001 
“O17 
053 
*105+ 


‘171 
-250 
+344 
-402 
*598 


0-000 
‘001 


0-000 
“001 
*014 
043 
-084 


134 
194 
262 








| Confidence coefficient (% 
nm fr a 
90 95 99 
11 8/| 0-423 0-369 0-340 
9 577 -500 *407 
10 -685- -631 -500 
11 “803 -750 *641 
| 2 ee 
12 0; 0-000 0-000 0-000 
1 -009 -004 ‘001 
2 "045+ -030 ‘013 
3| -096 -072 -039 
4| -154 -123 -076 
5 -184 “181 121 
6 *271 +236 *175- 
7 +294 +294 *235- 
8 +398 +346 *302 
9 -500 -450 +321 
10 -602 -550 “445+ 
11 -706 654 *555- 
12 “816 “764 *679 
13 0} @000 0-000 0-000 
1 | -008 -004 001 
2| -042 028 -012 
3| -088  -066 -036 
4| +142 +113 -069 
5 173 -166 ‘ll 
6 +246 +224 *159 
7 -276 -260 213 
8 +379 +327 -273 
9| -455+ -413 — -302 
10 530 -480 -406 
11 621 -566 “477 
12 -724 -673 ‘571 
13 ‘827 *7175- -698 
14 0} 9-600 0-000 0-000 
1 -007 -004 ‘001 
2 -039 -026 ‘O11 
3 “081 -061 033 
4 “131 *104 064 
5 -163 +153 +102 
6 +224 -206 -146 
7 261 +206 -195- 














The observed proportion in a random sample of size n is r/n. The table gives the lower confidence limit for 
the population proportion 7, as a function of n andr. 
The upper confidence limit = 1 — (lower confidence limit, entered with n—r instead of r). 











th 

































































Epwin L. Crow 429 
Table 1 (cont.) 
1 of a 
Confidence coefficient (%) Confidence coefficient (%) Confidence coefficient (%) | 
n r ——_—__—___— —_—_—_—_| n r —_—_—_—__——_—— —j n r —— —__— 
:(%) 
thay 90 95 99 90 95 99 90 95 99 
99 x oe fod ae — 
i 14 8 | 0:355- 0-312 0-249 117 0| 0-000 0-000 0-000 | 19 5] 0-130 0-110 0-073 
oar 9| -406 -371 +286 1| -006 -003 ~~ -001 6 151 147 103 
340 2| -032 -021 -009 7| -209 -150 -137 
407 10| -422 -389 -364 3| -067  -050 -027 s 238 +222 173 
wa 11| -578 +500  -392 4| -107  -085- = -052 9 265+ = +232 212 | 
12| -635- -611 -500 
641 13 | -739 -688 -608 5 | -140 +124 -082 10 | -337 +312 -218 
14| -837 -794 -714 6| -175+ -166 = -117 11 | -386 -345- -293 
iy 7 | -225+ -166 “155+ 12 -386 365-—--3057 
000 a a = 8| -277 +253 +197 13| -440 -426 -383 
001 15 01] 0-000 0-000 0-000 9| -290 -254 +242 14) -560 -500  -436 
013 ‘sl 
039 ; a oe 10| -364 -337 +242 15| -614 -574 = -485- 
076 3 076 057 031 11 -432 -406 -338 16 | -663 “635+ = --545- 
4| +122 -097 -059 12| -500 -456  -380 17| -735- -684 -617 
121 nae 13 568 511 413 18 791 768 695- 
175- ‘ - pate 
pial eae ee 14 636 583 500 19 870 850 782 
302 6| -205+ +191 -135- 15| -710 -663 -587 —— ‘an 2 
32] 7 *247 “191 “179 1 = " ~ 
: 8| -325+ -294  -229 16] -775- “746-654 | 20 0 | 0-000: 0-000 0-000 
445+] | 9| -325+ -332 273 bP ~ AT nh as 1) -005+ -003 —-001 
555- ms +. ee” Tibi 2| +027 -018 -008 
pos , 3 -056 -042 023 
. 9 . . C . 2 
- ” prone ” 328 11g 0 | 0-000 9-000 0-000 4) -090  -071 -044 
11 | -500 -448 -373 
= 12 aon 1/ -006 -003 ~ -001 
} “600 "552 “461 2 -030 -020 -008 5 126 104 -069 
° = ol HC | ” 
‘000 | 13 675 631 539 pane 6 | -141 -140 -098 
14| -7583 -698 627 3| -063 -047 -025+ ‘ 
‘001 Yr 4 101 -080 -049 7 201 -143 -129 
O12 e 8| -221 209 = --163 
pre 15 *846 -809 *727 5 -135- 116 077 9 -255- +229 -200 
aeineoea i SAB 10 | -325- -298 —--209 
a 7| -216 -157 -145+ ct “a. an aan 
aan 16 0| 0-000 0-000 0-000 8| -257 +236 -184 ~ rie a 
= 1| -007  -003 -001 2 an! an 12| -367 -351 -293 
213 es a 13| -422 -411 -363 
po 2| -034 -023 -010 tog 
‘208 3| -071 -053 —-029 10| +349 -325- -298 See sake Ree 
4] -114 “090 055+ 11 -416 375- +314 15 578 -533 424 
-406 12 -464 “381 -318 16 633 +589 500 
571 6| -189 +178  -125+ 14| -581 556-466 18 | -745+ -707 -625+ 
-698 a — =. —- 19| -797 -778 +707 
8| -299 +272 -212 15 | -651 619 +534 
) 9| -305+ +272 -261 16 | -723 675+ -603 20| -874 +857  -791 
000 | | 17| -784 -758  -682 
001 10 | -381 852  —--295+ 18 | -865+ -843 -772 = 
ll 11} -450 -429 -357 | 21-0 | 0-000 §=60-000 ~=—-0-000 
12 +550 -500 *421 1 005+ -002 -000 
033 5 
-064 13 | -619 ‘571 -475+1 19 0! 0-000 0-000 0-000 2)\ -026 017 -007 
14! -695- -648 +549 1/ -006 -003 ~~ -001 3| -054 -040 -022 
102 | | 2| -028 -019  -008 4| 086 -068 -041 
146 |} 15| -765- +728  -643 3| -059 -044 -024 
-195- 16| -853 -822 -736 4} -095+ -O75+ -046 5| -121 099 065+ 
nit for The observed proportion in a random sample of size n is r/n. The table gives the lower confidence limit for 


the population proportion 7, as a function of n and r. 
The upper confidence limit = 1 — (lower confidence limit, entered with n—r instead of r). 











430 


Confidence intervals for a proportion 


Table 1 (cont.) 





22 


The observed proportion in a random sample of size n is »/n. The table gives the lower confidence limit for 


90 


0-130 
“191 
‘191 
*245- 





*306 
| *306 
| °353 
407 


| 0-000 
-005- 


0-000 
-005- 
023 
-049 
‘078 





95 


0-132 
“137 
*197 
*213 


-276 
-276 
*338 
-398 
“449 


“494 
“545 
-602 
662 
“724 


*787 
*863 


0-000 
“002 
‘016 
-038 
-065- 


094 
126 
132 
187 


-205+ 


260 
264 
326 


0-000 
“002 
‘016 
‘037 
-062 


Confidence coefficient (%) 


99 


0-092 
-122 
*155- 


“189 


201 
*257 
*283 
*339 
+347 


-409 
-466 
-534 
‘591 
653 


‘717 


‘799 


0-000 
-000 
-007 
‘021 
‘039 


062 
“088 
“116 
147 
“179 


“194 
+242 
273 
“318 
334 


396 
*450 
495+ 
546 
604 


666 
*727 
*806 


0-000 
-000 
‘007 
-020 
-038 





24 


eon = © 


eCrmMrnauw 


20 





21 
22 | 
23 | 
24 | 


Confidence coefficient (%)} 


90 


0-000 
-004 
022 
‘047 
-075- 


95 


0-090 
*120 


0-000 
“002 
-O15* 
-035- 
“059 


‘086 
“h16- 
*122 
-169 
‘191 


*234 
-246 
*308 
339 
*347 


396 
443 
-500 
-557 


604 


653 
-692 
‘754 
*809 
“878 


99 


0-059 
-084 
“Kl 
-140 
171 


187 
229 
265+ 
298 
323 


384 
420 
-429 
-500 


‘571 


614 
“677 
*735- 
“813 


0-000 
-000 
‘006 
‘O19 
-036 


“057 
“0380 
-106 
*133 
*163 


181 
*216 
257 
280 
313 


+362 
364 
-416 
-464 
536 


584 
636 
*687 
‘741 
*819 


Confidence coefficient (%) 








n F¢ se 
90 95 99 
25 0| 0-000 0-000 0-000 
1} -004 -002 -000 
2| -021. -014 -006 
3| -045- -0384 -018 
4| -072 -057 -034 
5; -101  -082 -054 
6| -101  -110 -077 
7| +158 118-101 
8| -158 161-127 
9| +214 -185+ —-155+ 
10) +246 «6-222, -175+ 
11 -255- +288) -205+ 
12-307 296 = -245+ 
13-360 «317. -245+ 
14, -389 +336 ~—-305~ 
15| -389 -384 -342 
16| -432  -431 352 
17| -500 -475- —-408 
18| -568 -525+ -451 
19| -611 -569 ~—-500 
20| -638  -616 -549 
21| -693 -664 597 
22| -745+ -697 -648 
23| -786 -762 -695 
24| -842 -815-  -755- 
25| -899 -882 -825 
26 0} 0-000 0-000 0-000 
1| -004 -002 -000 
2} -021 -014 -006 
3| -043 -032 -017 
4| -069 -054 -033 
5| -097 -079 -052 
6| -097 -106 -073 
7| +151 -114~— -097 
8| -151 -154 +122 
9| -209 +180 -149 
10| -233 +212 170 
11 | +247 +230 +195 
12| -209 +282 -234 
13| -342 +282 -234 
14| -342 -325+ -298 
15| -377 -374 +822 
16| -419 -421 342 
17| -460 -458 -393 
18| -540 -494  -438 
19| -581 535-474 























the population proportion 7, as a function of n and r. 
The upper confidence limit = 1 — (lower confidence limit, entered with n—r instead of r). 





th 






























































Epwin L. Crow 431 
Table 1 (cont.) 
t (%) 
—_— | Confidence coefficient (%) Confidence coefficient (%) Confidence coefficient (%) 
99 — 6 pee ee ee ct Secertes Mhtn lee LEE PE ik Se SE So aL 
. 90 95 99 90 95 99 90 95 99 
000 | Ae Sane ES Re. 
000 
006 26 20/| 0-623 0-579 0-513 | 28 5] 0-089 0-073 0-048 | 29 20/ 0-537 0-500 0-438 
018 21 658 -626 558 6| -089 -098  -068 21 | -575+ -549 = -477 
034 | 22} -701 -675-  -607 7| -1389 +106 = -089 22| -615- -587 =-523 
23 753-718 658 8| -139 -142 +112 23| -655+ -626 -562 
054 24! -791 +770 702 9| -197 -170 -137 24| -697 -661 -603 
077 ; 
101 | 25 849 820 766 10 208 192-162 25| -721 -701 -646 
127 | 26 903 886 830 il 232 217 «175+ 26| -775+ -749 -684 
155+ | 12 "284 258 *214 27 811 -789 °737 | 
aN a eee 13 -310 259 218 28 | “866 +834 “789 | 
75+ 14| -312 307 272 29| -914 -897 = -840 
piel 27 0 | 0-000 0-000 0-000 | | 
aieie 1| -004 -002 -000 15| -355- -355- —-272 = a 
><a | 2) +020 -013 -006 16 | -396 -381 +323 | | 
peal 3| -042 031 -017 17| -435+ -384 -364 | 30 0 | 0-000 0-000 0-000 | 
_ | 4| -066 -052 -032 18| -473 -424 -364 1 | 004 = 002-000 | 
a | | 19| -527 -463 -408 2} 018 012 -005+| 
359 | 5| -093 -076 -050 . . 3 | a 
pn oS 6| ee ae ees 20] 565-537-449 4| -059 -047 -028 | 
451 7| -145+ +110 — -093 - =. SS | | 
500 | 8) 145+ +148 9-117 22) -645+ 616-551 5| -083  -068 —-045-| 
; | 9| -204 -175- -143 23 | -688  -643 -592 6| -083 -091 -063 
549 | | 24, -716 = -693 636 7| +129 -100 -083 | 
“te | 
rs 29 -202 -166 8 *129 “131 104 | 
‘597 | = = bmn pond 25 768 741 “677 9| -182 163 -127 | 
648 | | ose ree a 26 799 = +783 -728 | 
695+ "2 "268 “so 27| -861 -830 -782 | as — 
755-| 13| -326 -269 -225- 28| -911 -894 -838 10; “182 175+ -151 | 
14| -326 -316 -284 11} -219 -205+ -151 
825- | <3 Sa te 12 | *265- +236 198 | 
| hots -36 “ | :265- +244 +206 | 
15 | 365+ 364-298 1 29 0 | 0-000 0-000 +0000 oly Ra OE Ri te 
| 16 | -407 -402 +332 ;| «a. <<. <a 14| -295 292 244 
000 | 17) -447 -430 — -383 2} 018  -012  -005+ | 
-000 | 18 -500 +437 -413 3 039 029 015+ 15 | -336 -324 256 | 
006 | | | 19 +553 500 419 poe ’ ‘ 16 376 -324 308 | 
. 7 4| -062 049 030 
017 |} 17| -416 364 +329 | 
-033 | 20 593 -563 461 5 -086 -070 -046 18 446 -403 *B45- 
21 *635- *585> -539 6 -086 -094 -065+ 19 -476 -440 -388 | 
052 | 22 -674 636 581 7 +134 -103 -086 
073 | 23| -709 -684 -616 8| +134 -136 -108 20| -508 -476 -430 
-097 24| -761 ‘731 668 9| -189 166 = +132 21 | -545- +524 -462 
-122 | 22| +584 +560 -495- 
-149 | 25| +796 -777  -703 10; -189 +184 +157 23| -624 -597 -531 | 
) | 26 | -855- -825- —-775+ M1) -225- 211 -165+ 24| -664 -636 -570 | 
-170 27 -907 -890  -834 12| -276 +247 -206 
-195- 13| +294 -251 -211 2 05+ 676 612 | 
234 ——<$j 14| +303 -209-260 od i BER 
2 [sb ai “75 -690 | 
= 28 0 0:000 0-000 0-000 15| -345- -339 268 27 pn po eo 
7 1|/ -004 -002 -000 16| -385+ -339 -316 28 | p+ gs RIG 
399 2} -019 -013 -005+ 17| -425- -374 +346 29 y ; 794 | 
342 ) 3} -040 030 —-016 18| -463 -413 -354 a | 
393 4) -064 -050 -031 19 | -500 “451 +397 30) -917 “900 849 | 
-438 fummert ek y Sie s —_____— — 
“474 The observed proportion in a random sample of size n is r/n. The table gives the lower confidence limit for 
the population proportion 7, as a function of n and +. 








ine 1 The upper confidence limit = 1— (lower confidence limit, entered with n—r instead of r). 











432 Confidence intervals for a proportion 

Neyman’s (1937) definition of the shortest system and the short unbiased system of 
confidence intervals, and Scheffé’s (1942a) added definition of the shortest unbiased 
system, deal with such less clear-cut comparisons in the case of continuous variables with 
probability densities. However, these definitions have not been applied to discrete variables 
except indirectly, by changing the given variable to a continuous one by adding an auxiliary 
variable with a rectangular distribution, as reviewed by Pearson (1950). In particular it 
can be easily shown that none of the four systems considered in this paper is either shortest 
or unbiased. While it is true that shortest systems generally do not exist even for con- 
tinuous variables, Eudey (1949) obtained shortest unbiased systems for 7 by the above 
device. Tables of these intervals do not appear to be available. 

The confidence interval systems 6, and 6, have an optimum shortness property in a geo- 
metric sense. If an acceptance region A(z) of any system is considered to have length equal 
to r,—1r, +1, then as 7 varies from 0 to 1, the acceptance regions sweep out a region in the 
(r, 7) plane composed of not more than 2” + 1 rectangles which may be called the confidence 
belt. Thus in Fig. 1 the rectangles containing A,(77), 17 in all, would be, reading from bottom 
to top: 

—0:5<r<0-5, O0<7<0-012; —05<r<1-5,0:012<7<0-061; _...; 
—O5<r<4-5, 0-210<m7< 0-232; 05<r<5-5, 0-232 <7 < 0-390; 
0-5 <r<65, 0:390<7< 0-391; 15<r<6-5, 0-391 <1 < 0-485; 
15<r<7-5, 0-485<7<0-515; ...; 8 5<r<9-5, 0-988 <m< 1-000. 


If the lower and upper confidence limits are denoted by 7, and 7y, = 1—7,,,_,, then the 
area of the confidence belt is 


= 


n+ 1 —2(1.7,3+1.mpot+...+1.Mzn) = 


r 


(7, Tr p)s 


ime 


that is, the total length of the n+ 1 confidence intervals. Now A,(7) and A,(7) are chosen 
so that their length is a minimum whatever z (except for a set of measure zero), for a given 
probability 1 —e is attained with as few terms as possible by including the largest possible 
terms. Since the area of the confidence belt is the integral of the length of A(z) over the fixed 
interval (0, 1), it is as small as possible with A,(7) or A,(7). Thus among all possible systems 
of confidence intervals, no system has shorter total length than the systems 6, and 04. 

The total length of the system 4, in Fig. 1 is 


2(0-232 + 0-379 + 0-454 + 0-481 + 0-558) = 4-208. 


Its ‘mean length’ d, (all r’s equally weighted) is 0-4208, while the mean length d, of 6, is 
0-4705. The mean lengths d; are tabulated on p. 433 as functions of sample size for ¢ = 0-10. 
The differences d,—d, are positive and decrease monotonely after a maximum at n=3. 
The differences d, — d are positive beyond x = 4 and oscillate, larger values being associated 
with ‘additions’ in the calculation of 63. The differences d, —d;, likewise oscillate, with an 
absolute maximum of 0-087 at n = 3 and an absolute minimum in this range of 0-015 at n = 26. 
The percenta, » reduction in mean length from 6, to 6, or 6, varies from a minimum of 5% 
at n= 1 toa maximum of 12% at n=3, 5 and 6 and is 6 % at n=30. The differences d, —d, 
for ¢ = 0-05 and e= 0-01 are essentially the same as for ¢ = 0-10 beyond n = 6; the percentage 
reductions are then less than for e=0-10 but decrease more slowly (to 4% at n=30 for 
€=0-01). The large-sample confidence intervals based on the normal approximation to the 








n of 
ased 
with 
bles 
liary 
ar it 
rtest 


con- 
bove 


geo- 
qual 
i the 
ence 
ttom 


1 the 


osen 
riven 
sible 
fixed 
stems 





Epwin L. Crow 433 


binomial have, for n=30 and ¢=0-10, a mean length that is 0-013 less than d, and 0-029 
less than d,. 



































| | | | 
n | 1 | ia a bagel ie | all he 7 8 
| | 
d, | 0-950 | 0-834 | 0-740 | 0-667 | 0-612 | 0-565 | 0-528 | 0-497 
d,=d, | -900 755 653 | -604 | +540 498 -490 451 
d,—-d, | -050 079 087 | -063 | -060 042 -033 030 
d,—d, | -000 -000 000 | -000 | ‘O12 | 025 -005 016 
| 
| | | | | | 
ce” ae 1 | 12 14 | 16 18 2 || 2 | 32 
| | 
SrteR Gah | | 
| ad, | 0-448 | 0-410 | 0-380 | 0-355 | 0-335 | 0-317 | 0-283 | 0-257 
| d=d, | -417 | +373 “350 328 | +306 -296 267 +242 
d,-d, | -023 | -017 013 010 | -008 -005 -004 -003 
i~d, | 008 | -020 | O17 017 | -021 -016 012 012 
| | | | 














The mean lengths d; may be compared by graphical integration with the like mean length 
of Eudey’s shortest unbiased system for the case he gives explicitly (Graph 13), n=5, 
1—e€ = 0-80. The mean lengths of 6, and Eudey’s system are the same to graphical accuracy, 
0-437; this is 7% less than d, and 17 % less than d,. 

The advantage of 6, over 6, can be measured for given e and n by comparing the averages 
over all r from 1 to [}n] of the ratio of the length d; , of the confidence interval for a given 
r divided by r/n, that is, tin) d, 


where [4n] denotes the largest integer contained in }n. A few values are indicated below 
for e=0-10: 




















» 1 « + we is | 20 25 | 30 
« pS ae ¥ | | 
| | | | 
_R, | 254 | 2-09 | 194 | 1-69 1-59 | 1-45 
R,-R, 0-29 | O11 | O12 | O12 | 0-08 | 0-07 
R,-R, | -0-:05 0:04 | 007 | 0-03 | 0-06 | 0-06 
| | | | 





The negative value of R,—R, results from the necessary omission of the term for r=0 in 
the averaging. 

Confidence intervals are often chosen by minimizing their expected length irrespective 
of the parameter value (Kendall, 1946, p. 72; Scheffé, 19426, 1943). A disadvantage of ‘his 
criterion is its dependence on the particular form of the parameter, e.g. 7 or 2 aresii 
Scheffé (19425) showed that when it is applied to the logarithm of the ratio of variaice 
from two normal populations, it gives the shortest unbiased system. The expected lengt!is 
of 6,, dy, dg, 6, and Eudey’s shortest unbiased system, say 6;, for n = 5, ¢ = 0-20 were evaluate 
using the National Bureau of Standards 7'ables and are graphed in Fig. 3 as a function of : 

28 Biom. 43 











434 Confidence intervals for a proportion 


In this example at least, 6, and 6, are uniformly better than 6, according to this criterion, 
and 6, is better than 3, except for 7=0 and 1, but the shortest unbiased system 4; is not 
uniformly best. Of the four systems based on the discrete variable, 6, approximates 4; 
best with respect to the desirable property of relatively short expected length for 7 near 
0 and 1. 

The Sterne system 6, (or 6,) minimizes the expected length of confidence interval if 7 has 
an a priori distribution which makes all values of r equally probable. Such ana priori dis- 
tribution is the uniform distribution over (0, 1). 





07 T T T T 


Expected length of interval 














o1- . 
00 i 4 | i é 
0-0 01 02 03 0-4 05 


7 
Fig. 3. Expected lengths of various confidence intervals for n= 5, 1—e=0-80. 


The systems may be compared on the basis of the probability of covering the true 
parameter value 7, or of failing to cover, as shown in Fig. 2 for 6, and 6,, n=9, e=0-10. 
The probabilities for 6, and 6, either fall below the curves shown or coincide with them. 
The probability for 6, rises to € at only 16 points in (0,1). The probability for 6, never rises 
to ¢, having an absolute maximum of 0-088. If a uniform a priori distribution of 7 over 
(0, 1) is assumed, the mean probabilities of covering 7 by 4, 4,, 6, and 6, are 0-965, 0-950, 
0-938 and 0-935 respectively. 


I should like to acknowledge the collaboration of Eleanor G. Crow and Robert 8. Gardner 
in the calculations and to thank Theodore E. Sterne for communicating his results to me 
prior to publication. 





a eZ 


U 





rion, 
is not 
tes 6; 
near 


7 has 
4 dis- 


> true 
= 0-10. 
them. 
r rises 
T over 
0-950, 


irdner 
to me 





Epwin L. Crow 435 


REFERENCES 


CiaRK, R. E. (1953). Percentage points of the Incomplete Peta Function. J. Amer. Statist. Ass. 
48, 831. 

CLopPER, C. J. & PEARSON, E. 8. (1934). The use of confidence or fiducial limits illustrated in the 
ease of the binomial. Biometrika, 26, 404. 

EupEy, M. W. (1949). On the treatment of discontinuous random variables. Technical Report 
No. 13, Statistical Laboratory, University of California, Berkeley. 

Hatp, A. (1952). Statistical Tables and Formulas. New York: John Wiley and Sons. 

KENDALL, M. G. (1946). The Advanced Theery of Statistics, 2. London: Charles Griffin and Company. 

NATIONAL BUREAU OF STANDARDS (1949). Tbles of the Binomial Probability Distribution. Washing- 
ton: U.S. Government Printing Office. 

NEeyMAN, J. (1937). Outline of a theory of statistical estimation based on the classical theory of 
probability. Phil. Trans. A, 236, 333. 

Pearson, E. S. (1950). On questions raised by the combination of tests based on discontinuous 
distributions. Biometrika, 37, 383. ’ 

Pearson, E. 8S. & Hartiey, H. O. (1954). Biometrika Tables for statisticians, 1, Table 16. Cambridge 
University Press. 

ScuEFre, H. (1942). On the theory of testing composite hypotheses with one constraint. Ann. Math. 
Statist. 13, 280. 

ScHEFFs, H. (19426). On the ratio of the variances of two normal populations. Ann. Math. Statist. 
13, 371. 

ScueFrn, H. (1943). On solutions of the Behrens-Fisher problem based on the ¢-distribution. Ann. 
Math, Statist. 14, 35. 

STERNE, T. E. (1954). Some remarks on confidence or fiducial limits. Biometriica, 41, 275. 

THompson, C. M. (1941). Tables of percentage points of the Incomplete Beta-Function. Biometrika, 
32, 181. 

U.S. ARMy ORDNANCE Corps (1952). Tables of the Cumulative Binomial Probabilities. Ordnance Corps 
Pamphiet ORDP 20-1. Washington 25, D.C.: Office of Technical Services, Depertrment of Com- 
merce (Order No. PB111389). 











[ 436 ] 


SERIAL CORRELATION IN REGRESSION ANALYSIS. II 


By G. 8. WATSON* anp E. J. HANNAN 


Australian National University, Canberra, A.C.T. 


1. IytTRoDUCTION 


In a previous paper (Watson, 1955)+} a theoretical analysis was made of the effect of making 
a wrong presumption concerning the correlation matrix of the residuals from a regression 


of the form 
"1 yy Uy PAs bas | 
P]=| 3 PWTil+|il, (1-1) 
YN ty + Un LAy UN 


or y= Xf+u 


when the vector u has zero expectation and 
E(uu’) = o?a = (a non-singular). 


In particular, bounds were obtained for the following quantities: 

(a) The efficiency of the estimates of the /; (see equation (1-3-5)). This is measured by the 
ratio of the determinantal values of the covariance matrices of the best linear unbiased 
estimates and the actual estimates used. 

(6) The significance points for the test of the hypothesis 


fiB=¢, (i=1,...,h<h), 


where f;, ¢; are given and the f; are linearly independent. The test statistic used was that 
appropriate to the case a = I (see equations (1-4-33), (1-4-37), (1-4-44) and (1-4-54)). 

The bounds in each case were shown to depend upon the latent roots (particularly 
the extreme latent roots) of the matrix HaH’, where H is such that HyH’ = I and y 
is the presumed correlation matrix. These latent roots are therefore the latent roots 
of ay~. 

In this paper we go on to study cases where the errors, u,;, are generated by certain 
stationary processes whose form or parameters are incorrectly prescribed a priori. In §2 
the spectral theory of stochastic processes is used to study the latent roots of ay. In §3 
the functions of these latent roots which are needed to examine the bounds to quantities 
(a) and (b) above are evaluated when the true and assumed error processes are autoregres- 
sive and moving-average. In $4 the results of §3 are discussed and illustrated with tables 
and graphs. In general, the bounds are wide, indicating that the effect of not knowing 
the true form of the error process may be severe. In §5 certain asymptotic results 
are established to show that, in many cases which occur in practice, the bounds may 
be attained. 

* The results in §§3, 4 were obtained while one of us (G.S.W.) was a member of the Department 
of Applied Economics, Cambridge. 


{ For convenient reference to equations in Part I of this paper, we merely prefix their numbers 
by unity; thus equation (1-2-5) is equation (2-5) of Part I. 











king 


sion 


(1-1) 


y the 
iased 


that 


ularly 
und ¥ 
roots 


srtain 
In §2 
In §3 
tities 
egres- 
tables 
owing 
esults 
} may 


tment 


imbers 











G. S. Watson ann E. J. HANNAN 437 


2. THE LATENT ROOTS OF ay! 


Consider the case where the u; are generated by a stationary process with absolutely 
continuous spectral function, the spectral density being 


f(o) = Epem, (21) 


the p, being the serial correiations of the process. 
If we assume that the true spectral density is g(w) with 
go) * =k 


oe} 2 
%b, e-tie! (by =1), (2:2) 








an 
then we shall apply on autoregressive transformation to (1-1) obtaining a system of errors 
which will be, at least asymptotically, of the form 


0 
The spectral density of the v; will then be 
ee 


where @° is the variance of the »;. 

The correlation matrix corresponding to (2-4) will be o>*0*HaH’, so that the function 
generating, as Fourier coefficients, the elements of HaH’ is h(w). For the class of processes 
we shall be considering it can be shown that the latent roots of HaH’ (that is, the latent 
roots of ay~') will be, asymptotically, the values of h(w) for equidistant values of the 
argument (in the range —7 to 7) (see Whittle, 1951; Grenander, 1952). 

The extreme latent roots of ay! can then be approximated by the least and greatest 
values of h(w) in the range — 7 to 7, while the sum of the latent roots will be, approximately, 


i sia . 
H = an | Mo) do =N-. (2-5) 


We shall indicate the extreme values of h(w) in —7 to 7 by h, and h,,. 

In the original form of this paper (Watson, 1951), a different approach was used to obtain 
the results of this paper. Autoregressive and moving average processes of the first and 
second order were approximated by processes with commutative covariance matrices 
whose latent roots and vectors were known. The present approach is more general but gives 
no guide to the effect of the approximations. The earlier work showed that for the cases to 
be considered below the use of the extreme values of (2-4) and the integral (2-5) will result 
in errors which are sufficiently small, for our purposes, even for samples of only 15 or 20. 
Similarly, in formulae such as (1-2-5), the average of the N —k largest (or smallest) roots of 
ay—! appear. We shall replace this by the average of all of the latent roots. Again the effect 
will become small as N increases relative to k. 

We shall consider cases where the spectral densities f(w) and g(w) are rational functions 
of cos w, which corresponds to the cases where we have 


p 
~ AjUi-; = Lj €i-7> (2-6) 


oMs 








! 





438 Serial correlation in regression analysis. II 


the ¢; being independent random variates. We shall use A; and j; for the true process as 
indicated in (2-6) and 1;, m; for the corresponding parameters in the assumed process. We 
shall use p’ and q’ for the maximum lags in the assumed process. 

The spectral density of a process, u;, of the form (2-6) is (Doob, 1953) 








a 2 
eye o 

fw) = | 3—a2$_ o=Ho= 1). (2-7) 
BAe 








Here o? is the variance of the process ¢;. When the assumed process is also of the form 
(2-6), the evaluation of (2-5) or the extreme values of (2-4) is relatively straightforward, 
though care has to be exercised in some cases. 

The bounds which are to be developed are homogeneous functions of degree zero in the 
latent roots so that we can neglect the factors k and o20~*, which appear in A(w), without 
affecting the result, and this will be done below. 

Then (2-5) becomes 

@ 
>> 5555p) 4-51 q 
H = NS" _—__ >> p#. (28) 
> NAPs 


j= 


3. THE EFFECTS OF A WRONG PRESCRIPTION FOR AN ERROR 
PROCESS OF THE FORM (2:6) 


The bounds to (a) and (6) (of §1) obtained in Part I of the present study depended on the 
extreme roots and the sum of the roots of the matrix ay~ after the removal of the roots corre- 
sponding to any regressor vectors which were latent vectors of the covariance matrix of the 
transformed residuals. In using the approximations h, and h,, for the extremes and H for 
the sum of the roots we shall, of course, be neglecting the effect of the removal of these roots. 
We shall also be neglecting the effect of the value of k, in the bounds to (b). The effect of 
resulting errors will, of course, vanish asymptotically, but it also appears to be small for 
quite small values of N. We shall evaluate the bounds to the efficiency only for the case where 
one regressor vector is not a latent vector. The more general case, where none of the k 
regressor vectors are latent vectors, can be treated, approximately, by raising the bounds 
for the case k = 1 to the power k. 

Below, for certain processes of the general form (2-3), we give the value of h(w), H, h, 
and h,. In addition we show, sometimes for particular values of the parameters involved, 
the lower bound to the efficiency of this estimate, H, (the upper bound is, of course, unity), 
and the factors 


h, os h, " 
h=NG, f=NZ. (3-1) 


These last factors, when multiplied by the significance point for ¢? appropriate to the case 
where /(w) is unity and the distribution of the uw; is normal, give approximations to the true 
significance point (for the chosen level). This approximation was discussed in the last 
paragraph of Part I of this investigation. 














iS as 


2-7) 


orm 
ard, 


the 
out 


2-8) 


the 
rre- 
the 
for 
ots. 
t of 
for 
lere 
e k 
nds 


Lh, 
ed, 
ty), 


3+1) 


ase 
rue 
last 








G. S. Watson AND E. J. HANNAN 439 


(i) p=1,q=0,p'=1,q' =0. 
Here it will be convenient to put —A, = p, and —1, = 7,: 


1+7%—2r,cosw 


ae 1 +p?— 2p, cosw’ 


N P 
H = 1-4" ~ 7A. 














_ f{l—1,\8 _ fltn\ 
m= (EA) b= (GA) eon. 
_ (1+r\? _ fji-n\ 
" ao aide (=A) stiionn 
E, (1 — pj)? (1-13? 


© (L+7}—4ryp, + pit pp)” 





f = (=) 1—pj 
« \M-p [apr 
(P121})- 
a(n), 1a 
i \i+p 1-29, +7 





When p <r the bounds f,, and f, are interchanged. The values of £,, f,, and f, for various values 
of p, and r, are shown respectively in Tables 1-3 below. 


(ii) p=2,q=0,7' = lq =0. 
Again it will be convenient to put —A,(1+A,)~! = p, and —1, = 7,: 


1+7?—2r, cosw 











MO) = TEAEE AE 2A,(14 Aq) 0080+ 2A, 08 2 
i l+ri—2ryp, 
oe 1— AZ —AZ— 2A AQP," 
AL +A 
Case (a). 17, = 0, re 21. 
h : = a : 





as (14A,)2(1—p,)?” * (14Aq)*®(1+p,)®’ 
E, = (1—p?)?(1+ 3)? (see Table 1 for r,=0). 


A,(1+Ag) 


Case (b). 7, = 0, DA, 


<1. 








The extreme values h, and h,, are then obtained from one of those under case (a) and 


1 





A 2 
(1-2,)*{1— Soe 


(depending on the signs of A, and A,). 
E, will not be written down but is shown for some A, and A, in Table 4. 








440 Serial correlation in regression analysis. II 


Case (c). 1 = pj. 


1 1 
hu = TA’ n= yap (A, > 0), 
1 1 
“Tea “Tana OS” 


1-23)? 
B= (759) 


See Table 1 for 7, = 0 reading A, for p,. 
The effect of these incorrect prescriptions of the error process on the test of significance 
are discussed in the following section. 


(iii) p = 0, g = 2, p’ = 0, 7’ = 1. 


oe _ (142) _ my 


Te meey Peery OO em 
1+ Wi + My + 2py(1 + Mg) COS W + 2fty COB 2a 
1+ m3? + 2m, cos w 





h(w) = 
We consider only the case where r, = p,. Since 7, < } this imposes the restriction p, < }: 


= Sh e1(— — 2pj i)}. 
1+mj Py 





(1—4p7)! 
The extremes h,, and h, are obtained from 





es +43 (1, 2Ps ' \ 
eat a a “F2 — 22) — 1 
. 1+ 7 2 , ia 
and one of +a ( +2 *s) , (PiP2 positive), 
1+ ei+ 13 2Ps ‘ 
+m 1 1+ 2p,)” (PP, negative). 


Only the bound £, was evaluated and this is shown in Table 5 and Fig. 1. 
(iv) p=0,qg=1,p'=1,q7' =0. 
Again we put “,(1+2)-! = p, and —1, = 1,: 
h(w) = (1+ 42 + 2, cos w) (1 + 17+ 21, cos), 
H = N{(1 +14) (1+ 9) — 2744}. 
ie —1)(1—/47;) 
47; 
The ibis of h(w) are then 
(L+r)®(1—yy)%, (L—ry)® (1-4/4). 
BE, = (L—r9)? (1-49)? (L479 — 27a + +e), 
which is shown in Table 1, putting , = p,. 


Case (a). 21. 





es =e. 
Case (b). (Ha =") (=A) | oy 
4fyry 
The extremes of h(w) are 
(44 — r,)" (1—/47)" 
4,7; 





(1+93)(1+43)+ 





anc 





ice 





G. 8S. Watson ann E. J. HANNAN 441 
and one of (+7)? (L-4y)? (71 </4), 
(l—r,)?(L+/4)? (71 >/4)- 
The efficiency H, is shown, for this case, in Table 6. 
Other cases may be considered, perhaps for restricted values of the parameters, by 
suitably identifying their parameters with those of examples (i)—(iv) above. This is because 


of the inverse relationship of the spectral density functions of the autoregressive and moving- 
average processes. Some examples are 


p=1, q=1, p=0, q’=0 = (i), 
p=0, q=1, p=0, g=1 >= (i), 
p=1, q=1, p=0, qg=1- (i), 
p=1, q=0, p=0, gq’ =1 = (i), 
p=1, q=1, p'=1, ¢ =0 = (ii), 
p=0, q=2, p’'=0, gq’ =0 = (iv), 
p=90, q=0, p’=2, gq’ =0 = (iv). 


4, COMMENTS ON THE RESULTS OF §3 i 


From the last paragraph of § 3, Table 1 is relevant to several cases but is labelled for the 
case where a first-order autoregressive error process, first serial correlation p,, is mistaken 
for a process of the same form but with first serial correlation 7. 

Table 1 is the most widely applicable of the three tables as it is of use in cases (i), (iia), 
(iic), (iva), and in other situations which, by the last paragraph of § 3 are equivalent to these 
cases. The discussion is given in terms of case (i). The only striking feature of Table 1 is the 
poor performance of the high values of r,. Thusit appears that the first difference transformation 
may lead to very low efficiencies unless p, = 1; the zero efficiency at, = 1 is due to the approxi- 
mation. It will be noted that, except for high p, and r,, a value of r, within 0-1 or 0-2 of p, 
leads to efficiencies of better than 70°%. It is only when p, is high that the use of a near 
exact value of r, is imperative for a reasonably efficient analysis. 

Table 4 covers the case where a second-order autoregressive process is mistaken for a 
process of independent variates. When the procedure appropriate to independent residuals 
is used, the first column of Table 1 or Table 4 apply, according as | (A, +A,A_) Ay! | > 1 or 
<1. Table 4 also shows the corresponding values of (1 —?)? (1+ %)-*, appropriate when 
| (Ay +A,A,) fAz+| > 1, for comparison. Thus while there are second-order autoregressive 
processes which give direct least squares an efficiency which is dependent only on the first 
serial correlation. p, = —A,(1+A,)~ and is high when p, is small, there are others which 
make the efficiency very low even when p, is small. 

Table 5 covers the case where a second-order moving average is mistaken for a first-order 
moving average with the correct first serial correlation p,. The results are displayed graphic- 
ally in Fig. 1. For each pair of values of 7, and /, the correlograms of the true and assumed 
error processes have been drawn. The graphs have been arranged in the same order as in 
Table 5, and on each the lower bound to the efficiency is given. An examination of Fig. 1 
shows that an apparently better agreement between the true and assumed correlograms may 
lead to a worse lower bound to the efficiency. This suggests that the common practice of 
comparing correlograms may not be the best procedure. 








442 Serial correlation in regression analysis. II 


As has been seen in § 2 the lower bound to the efficiency is simply related to the spectral 
densities rather than to the correlograms. For the cases which we are considering the 
lower bound to the efficiency can be zero only if the true spectral density has a zero. This is 
seen to be so for all the cases with a zero lower bound in Table 5. This lower bound will of 
course be attained (or nearly attained) for regressor vectors which are latent vectors of 
the correlation matrix of the transformed residuals corresponding to the zero (or near zero) 
latent root, but it may also be attained, at least asymptotically, for other regressor vectors 
also (see Grenander, 1952, p. 568, and § 5 below). 

Table 6 covers the case where a first-order moving average is mistaken for a simple 
Markoff process. We have two cases according as | (#,—1,) (1 — #47) (444,71) | is > or <1. 
The first case is covered by Table 1 with ~, = p, and the second by Table 6. Both show that 
the efficiency falls off more rapidly as p, and 7, diverge, than for cases where the two pro- 
cesses are of the same type, as is to be expected. 

Finally, Tables 2 and 3 show the correcting factors which may be applied to the signi- 
ficance points for the test of significance of a regression coefficient (based on ¢?) when the 
true and assumed error processes are first-order autoregressive. The meaning of these tables 
may be seen from an example. Suppose that the t-test has been made when the error has a 
first serial correlation of 0-5 but when the estimation procedure has been based on an 
assumed first serial correlation of 0-7. Since r, = 0-7, p, = 0-5, the true significance point 
lies between 1-22¢? and 0-34#?, where é? is the tabular value—assuming, of course, that the 
approximation is exact. 

In order to find the order of magnitude of the possible effects of serial correlation on the 
t-tests and to investigate the range of validity of the approximation, either numerical 
examples or the asymptotic approach of § 5 must be considered. In the calculations shown 
in Table 7, only the extreme significance levels of t-tests using the tabular significance points 
were found (for the case of only one regressor in addition to the mean). This avoids the 
difficulty of inverse interpolation. Although Table 7 is not extensive, it shows that the first 
approximation is quite a useful guide. The exact results were found by using the method of 
Pitman & Robbins (1949), the calculations being performed by the EDSAC in the Mathe- 
matical Laboratory, University of Cambridge. The second approximation there shown was 
obtained by making the first two small-sample moments agree for the case where there was 
only one regressor apart from the mean. For this case the adjustment corresponding to the 
use of the f, and f,, also makes the first two moments correct, asymptotically, so that a fair 
agreement is to be expected. The validity of the analysis corresponding to the assumption 
that the errors follow a first-order process whose first serial correlation equals that of the 
true error process, when in fact the true error process is of the second order, will now be 
examined by considering the disturbances in the significance levels of the t-test of regression 
coefficients. Let it first be supposed that the true errors follow a second-order autoregressive 
process with A, = — 1-3 and A, = 0:3. This is the process suggested by Orcutt (1948) for 
Tinbergen’s series. Since A, < }A?, the correlogram of this process does not have harmonic 
oscillations. It will be seen further that it has a weak central tendency. The first serial 
correlation of the process is unity, so that the lower bound to the efficiency of a least- 
squares analysis is zero. Here we are supposing that a first-order autoregressive transforma- 
tion based on r, = 1 has been used, i.e. that the analysis has been carried out on the first 
differences of the series involved. In this situation the lower bound to the efficiency may be 
computed and is 0-70. In a sample of 20, the use of the second approximation for deter- 











anc 














G. S. Watson ann E. J. HANNAN 443 


mining the bias in the significance level of the classical t-test at 5° shows that the true 
significance level may be almost as small as 1 % and as great as 16%. As a second example, 
we will suppose that the errors follow a second-order autoregressive process with A, = — 0-5 
and A, = — 0-4. The correlogram does not have a harmonic oscillation. If least squares had 
been used directly, the efficiency lower bound would have been 0-02. However, if the 
analysis had been carried on the data after the use of a first-order autoregressive transforma- 
tion with p, = —A,(1+A,)~! = 0-83, the efficiency lower bound is raised to 0-54. In the 
latter circumstances, the true significance level of a t-test made with the tabulated 5% 
point may range from 0-3 % to slightly greater than 20%. These calculations confirm the 
idea that better results are obtained in this case by the use of a first-order autoregressive 
transformation than without it, but they also show clearly that the performance may still 
be poor even when the first serial correlation is known free of error. 

As a last example, consider an error process which is a first-order moving average with 
4, = 0-6, while it is presumed to be a Markoff process with r, = ,(1'+ 4?)-! = 0-44, when 
20 observations are available. The ratios of the least and greatest roots to the mean root are 
0-330 and 1-49 respectively. Since the 5% point of ¢ with 18 degrees of freedom is 4-41, 
the first approximation tells us that a test using this point may have any significance level 
between P(x > aa and P(F,x0> ra} , Le. lie between 0-4 and 10-5%. These prob- 
abilities agree well with 0-3 and 10-5°%%, found by using the second approximation above. 


5. REGRESSION ON ANALYTIC FUNCTIONS 


Grenander (1954) and Grenander & Rosenblatt (1954) consider regressor vectors x; (ortho- 
normal) which are subjected to restrictions which effectively require the existence of the 


limits, _ nrh 
r (h) = lim >» Ult+n) et (4, = 1, Hey k), 
? n—>o t=1 
and permit the representation 


R, = [r,.(2)] = | ; 


e%? dM(@), 
where M(@) is a matrix of functions with M(@,)—M(@,) Hermitian non-negative definite 
for 0, >0,. It is presumed that R, is non-singular. Introducing N(@) = M(@+)—M(—6é-) 
(which is symmetric and real), it is shown by Grenander & Rosenblatt that if N(@) has only 
s<k points of inerease @,, ...,9, such that 
(i) dN(;) = [n,,(j)] = N; is non-null, 
(ii) N,N; =null matrix (¢+)), 
=N; (¢=)), 
then the least-squares procedure gives estimates of the regression coefficients which have, 
asymptotically, the same covariance matrix as best linear unbiased estimates. These 
conditions are also shown to be necessary for this to be so. 
In fact it is easy to show (under fairly weak restriction on f(w))* that 


lim [x,0x,] = [ 3 10,) (0) 


= 5 f(6,)N,, (5-1) 
j=1 


* In particular, /(w) must not have zeros at the points 0; (see Grenander, 1952, p. 568). 














444 Serial correlation in regression analysis. II 


and taking into account the properties of the N; it is evident that the latent roots of X’aX 
are, in the limit, the f(@;) repeated k; times (where k; = tr N,, integral; > k; = k). The suffi- 
; j 


ciency part of the theorem then follows immediately from (1-3-1) (see Grenander, 1954, 
p. 263). 

These results may be used to suggest asymptotic approximations to the quantities for 
which we have above provided bounds. It will be seen below that these bounds are often 
attained. 

An important éxample of a set of regressors satisfying Grenander & Rosenblatt’s theorem 
are the orthogonal (Legendre) polynomials for which s = 1 and 6, = 0. 

Applying the asymptotic result (5-1) to the formula (1-2-4) for the fractional bias of the 
ith regression coefficient, it is seen to reduce to 


gE F0,) mal. (5-2) 
The statistic R of (1-4-49) for testing all the k regression coefficients, takes the form 
2 2 
R = HA) M+ «+ MOe) Te (5-3) 
Le, F 
1 


with each f(4;) repeated k; times, since the roots v; of X’aX (see (1-4-50) and (1-4-52)) are 
asymptotically the f(0;). Thus 7' of (1-4-46) is given by T’ = R(1—R)-1, with R expressed 
by (5:3). The relationship of 7’ with the bounding random variables, given in (1-4-54), 
depends on the relation of f(9),, ...,f(@,) with a,, ...,@,. The approximations which lead to 


the factors (3-1) for the bounding random variables suggest that here 7’ should be approxi- 


mated by 








l k 
Ef) Sat 
i: (5-4) 
= N—>Dk,f(0; n 
von hf) Ni 
If the regressors are orthogonal polynomials, (5-2) reduces to 
N(1—f(0)) “ 
‘ N-— k xl —f(9), (5 5) 
and the constant factor of (5-4) becomes 
(N —k)f(0) 
pe A y b 5-6 
Woe *f) (5°6) 


Supposing further that the error process is Markoff, parameter p and that, as Grenander & 
Rosenblatt’s theorem would suggest, least squares has been used, the value of f(0) by (2-7) 
is (1 —p)-* (1 —p?). Then (5-5) becomes 


a 2p 5.7 
N—k1-p" at 


Watson (1951) showed that this was the minimum bias for p> 0 (maximum for p< 0) for 
this type of error process. Turning to the statistic 7’ discussed above, it is seen that, for p > 0, 

















5-2) 








G. S. Watson AND E. J. HANNAN 445 


f(0) is essentially the largest of the roots ,,...,%, so that 7' is approximately the upper 
bounding random variable of (1-4-54). The approximation of using the factor (5-6), here 
(1+p)(1—p)-", is exactly the same as using f,, of case 3(a) with r = 0, since least squares 
is assumed to have been used. When p < 0, 7' is distributed like the lower bound of (1-4-54). 
Thus, in this case, the bounding values are good approximations to the true values because 
f(0) happens to coincide with f,, or f,. This also happens when the errors are from a second 
order autoregressive process, falling in case 3(a), but not when it belongs to case 3(b). 
Similar observations may be made in other cases. 

It is therefore evident that the bounds are attained in cases which must occur 
frequently in practice—analytic regressors and simple error processes. This is certainly a 
surprising result. Since these bounds are often wide, it is clear that no single rule can 
reasonably hope to cover all cases. 


Our thanks are due to Mr S. Gill of the Cambridge Mathematical Laboratory for cal- 
culating certain Pitman-Robbins series on the EDSAC and to Mrs E. M. Chambers and the 
Computing Staff of the Department of Applied Economics, University of Cambridge, for 
the remainder of the calculations. 


REFERENCES 


Doos, J. L. (1953). Stochastic Processes. New York: John Wiley. 

GRENANDER, U. (1952). Ark. Mat. 1, 555. 

JRENANDER, U. (1954). Ann. Math. Statist. 25, 252. 

JRENANDER, U. & RosenBLaATT, M. (1954). Proc. Nat. Acad. Sci., Wash., 40, 812. 

Orcutt, G. H. (1948). J. R. Statist. Soc. B, 10, 1. 

Prrman, E. J. G. & Rossrns, H. E. (1949). Ann. Math. Statist. 20, 552. 

Watson, G.S. (1951). Serial Correlation in Regression Analysis. Institute of Statistics, North Caroiina. 
Mimeograph Series, no. 49. 

Watson, G. S. (1955). Biometrika, 42, 327. 

Waitt te, P. (1951). Hypothesis Testing in Time Series Analysis. Uppsala: Almqvist and Wiksells. 








446 


Serial correlation in regression analysis. II 





Table 1. Lower bound to efficiency of estimates. Case p = 1,q = 0, p’ = 1, q' = 0 





























































































































; | 
0 0-1 0-2 0-3 0-4 0-5 0-6 0-7 0:8 0-9 
Pi | 
0 1:00 || 0-96 | 085 | 0-70 | 0-52 | 0-36 | 0-22 | O12 | 0:05 | 0-01 
0-1 0-96, | 1:00 | 096 | 084 | 0-68 | 0-49 | 0-31 0-19 | 0-07 | 0-02 
0-2 085° | 0-96 1:00 | 0-96 | 0-83 | 0-64 | 0-43 | 024 | O11 0-02 
0-3 0-70 my 0-84 | 0-96 100 | 0-95 | 0-81 0-58 | 0-35 | 0-16 | 0-04 
0-4 0-52; 0-68 | 0-83 | 0-95 1:00 | 0-94 | 0-76 | 0:50 | 0-24 | 0-06 
0-5 0-36" | 0-49 | 0-64 | 0-81 0-94 | 1:00 | 0-92 | 068 | 0-36 | 0-09 
0-6 0-22 | 0-31 0-43 | 0-58 | 0-76 | 0-92 1:00 | 0-89 | 0-55 | 0-16 
0-7 0-12 | 0-17 0-24 | 0:35 | 0:50 | 0-68 | 0-89 1:00 | 0-81 0-30 
0-8 0-05 | 0-07 | O11 0-16 | 0-24 | 0:36 | 0-55 | 0-81 1-00 | 0-60 
0-9 0-01 | 0:02 0:02 | 0-04 | 0:06 | 0-09 | 0-16 | 0-30 | 0-60 1-00 
{ 2 - 2 
Table 2. f= ( _4-4. ) (; “1) 
1—2p,7r7,+77/ \l—p, 
1; | | 
0 | Ol 0-2 03 | Od 0-5 0-6 0-7 0-8 0-9 
Pi | | 
| | 
0 1 0-80 | 0-62 | 0-45 | 0-31 — — — — — 
0-1 1-22 1 0-78 | 0-58 | 0-41 0-27 — }j;-— — — 
0-2 1-50 1-25 1 0-76 0-54 0-36 —.). — — - 
0-3 1-86 1-58 129 | 1 | 0-73 | 0-49 | 0-30 | 0-16 — _— 
0-4 2-33 | 2-03 170 | 1-35 1 0-69 | 0-42 | 0-23 | 0-09 _ 
0-5 — | 267 2-29 | 1-86 | 1-42 1 0-63 | 0-34 | O14 | 0-03 
0-6 — — 320 | 268 | 212 154 | 1 | O55 | 0-24 | 0-05 
0-7 — — — | 414 | 340 | 258 | 174 | 1 0-44 | 0-10 
0:8 “= _ — | — | 623 500 | 360 2-19 1 0-24 
| 0-9 — — — |— | — | 1987 — | 743 | 380 | 1 
— 2 
Table 3. f, = ( oa aca) (; “1 
1—2p7 +77) \l+py 
| | 
nfo | | rece 
|) @ 0-2 0-3 0-4 05 | 06 | OF | O8 0-9 
pi \ | | | | | | | 
| | | | = | TS 
| | | | | | | 
o | 1 | 120 | 138 | 155 | 169 | — | — —- | — : 
01 | O82 | 1 1-18 1:34 | 148 | 160 | — | — — - 
02 | 067 | 0-83 l 1:16 | 1-31 1-43 _ 
03 | 054 | 069 | 0-84 1 | 115 | 1-28 138 | 1-45 — - 
0-4 | 043 | 056 | 0-70 | 085 | 1 | 213 1:25 | 1:33 1-39 - 
05 | — | 044 | 057 | O71 | 086 | 1 | 112 | 1-22 | 1-29 | 1-32 
* ee 0-45 | 0:58 | 0-72 | 0-87 1 1-11 1-19 1-24 
7 | — |= ~ 0-45 | 058 | 0-72 0-87 1 1-10 1-16 |- 
os | — — — — | 042 | 0-56 0-71 | 0-87 1 1-08 | 
0-9 | — — — — | — | 084 | O48 | 066 | O85 | 1 | 
| | 














G. S. Watson AND E. J. HANNAN 447 


Table 4. Lower bound to efficiency of estimates. 





















































nes A,(1+A 
Case p = 2,q = 0,p' = 1,q' = 0; | MU) | <y 
) — 
Parameters of the | 
a" true error process 1 -,? 2 _ 
1 1+pi to the efficiency 
‘ | 
; A. | Ie 
4 oa 
. —1:8 0-9 0-0030 0-0003 
6 —-01 | 0-9 0-9888 0-0099 
0 —1-0 0-5 0-1476 0-0769 
0 —0-5 —0-4 0-0327 0-0175 
0 —0-1 —0°8 0-3600 0-0122 
Table &. Lower bound to efficiency of estimates. Case p = 0, ¢ = 2, p’ = 0, q' = 1. 
Bracketed figures are p, and pz 
a a | 
Ay 
) 0-2 0-5 0-8 
P2 
i. 
—05 0-17 0-00 0-19 
(0-08, — 0-39) (0-17, — 0-33) (0-21, — 0-27) 
— 0-2 0-76 0-43 0-00 
(0-15, —0-19) (0-31, —0-16) (0-38, — 0-12) 
7 0-2 0-79 0:38 ims 
3 (0-22, 0-19) (0-47, 0-16) 
: 0-5 0-28 bo ~ 
4 (0-23, 0-39) | 
) 1-0 0-00 0-00 — 
(0-20, 0-49) (0-44, 0:44) 
_ 2-0 0-33 0-26 0-14 
(0-12, 0-40) (0-29, 0-38) (0-43, 0-36) 




















Table 6. Lower bound to efficiency of estimates. Case p = 0,q = 1, p’ = 1, q' = 0 


) T; 0-2 0-2 0-2 0-2 0-4 0-4 0-4 0-4 
Ay 0-2 0-4 0-6 0-8 0-2 0-4 0-6 0-8 
Pi 0-19 0-34 0-44 0-49 0-19 0-34 0-44 0-49 

a Minimum 0-99 0-81 0-42 0-10 0-81 0-90 0-55 0-15 





Sw re iV 








448 Serial correlation in regression analysis. II 


4 | " 


0-17 0:00 0-19 Revi 





0-76 0-43 0-00 the s 





. 
4 
y 
4 
| —— 
a 
0 
or - 
< Rss 
= GO oO 
OA = © re 









































0:79 0-38 Nair 
satis: 
beco 
the r 
2-93, 
5:3. 

TI 
write 
mate 

4 
It fo 
—— whe! 
over 
fram 
F obta 
Her 
0-26 0-14 ve 
plete 
Fig. 1. True and assumed correlograms from Table 5. only 
Table 7. Bounds to the 5% significance level in t-tests “ant 
E First Second 
C uxact : a . : 
Jase o/ approximation | approximation 
7%) ra} | O/ 
(%) (%) 
N=11 1-4 1 2 
Pp, =90°3, 7, =0 14-6 15 16 
N=20 — 2 4 
py =0-7, 7, =0°5 21 20 20 
N=20 — 0-4 3 
P,=9°9, 7, =0°5 — 20 20 





























[ 449 ] 


MISCELLANEA 


Revised upper percentage points of the extreme studentized deviate from the sample mean 


By H. A. DAVID 


University of Melbourne 


Let x, and 2, denote the largest and the smallest value respectively in a random sample of n observations 
drawn from a normal population with standard deviation o. The extreme studentized deviate from 
the sample mean is defined as (x, —%)/s, or (—2,)/s,, where s, is the usual root-mean-square estimator 
of o based on v degrees of freedom and is independent of the numerator. Tables of upper and lower 5 and 
1% points and later also of 10, 2-5, 0-5 and 0-1% points were constructed by Nair (1948, 1952) for n = 3 
to 9 and selected values of vy > 10. The former set of tables is reproduced as Table 26 in Pearson & Hartley’s 
Biometrika Tables for Statisticians (1954). 

Below we give a revised and slightly extended table of the upper percentage points of (x, —2)/s,, 
which should be accurate except for possible errors of one unit in the last figure shown. Comparison with 
Nair’s values makes it clear that the studentization procedure employed by him tends to become un- 
satisfactory for v< 20, as has been suspected by Pearson & Hartley (1954, p. 50, footnote). This effect 
becomes the more pronounced as n increases and the significance level « sharpens. For any given « 
the most severe correction is generally needed in the case n = 9, v = 10; in place of Nair’s values 2-54, 
2-93, 3-28, 3-67, 3-92, 4-40 for the upper 10, 5, 2-5, 1, 0-5, 0-1 % points we find 2-50, 2-89, 3-29, 3-82, 4-24, 
5:3. 

The present table is based primarily on a simple approximation of a type recently discussed by the 
writer (David, 1956). Let u, = x, —%. Since w, is the largest of n deviations, x —%, we have the approxi- 
mate relation, valid for large values of R (constant), 


Pr (u,/s,>R)=n Pr [(x—%)/s,>R]. 
It follows that the upper 100 % point of w,/s, is given approximately by 
[(n—1)/n}P t,(a/n), (1) 
where t,(@/n) is the upper 100«/n percentage point of a ¢ variate with v degrees of freedom. As (1) slightly 
overestimates the true percentage points the approximate values were corrected by referencs to a 


framework of exact percentage points constructed by numerical integration. These exact values were 
obtained as the solutions, R,, of the equation 


co Un|Ra 
J fies) | f(s,)ds,du,, = a. 
0 0 


Here f(u,), the probability density function of u,,, was found by differencing of Grubbs’s (1950) table of 
the cumulative distribution function of u,, and the inner integral evaluated from tables of the incom- 
plete I’-function. In the least favourable case (« = 0-10, n = 12, v = 10) the use of (1) led to an error of 
only 0-07. 


I am much indebted to Misses Betty Laby and Gwenda Jacobs for extensive assistance with the 
computations. 


REFERENCES 


Davin, H. A. (1956). Biometrika, 43, 85. 

Grusss, F. E. (1950). Ann. Math. Statist. 21, 27. 

Narr, K. R. (1948). Biometrika, 35, 118. 

Narr, K. R. (1952). Biometrika, 39, 189. 

Pearson, E. 8. & Harttey, H. O. (1954). Biometrika Tables for Statisticians, 1. 
Cambridge University Press. 


29 Biom. 43 























































































































Otte OO HANH DO rin oO OND OOO He DOANS © HO etr ON So mt 10 S10 mONONS aac oa 
g SSSESS BOSISS FFHHS Gon SSSSS VHHHH SESS OVO FYSAA Aaaaa SOSS SEE 
ANNAN ANNAN AAA NAN MOOan»n ANNAAN ANNAN AAN OD OD 09 69 OD OO 09 69 69 OD OOACA ANN 
: OM Or 6 HARK SO rtann © oO Coat Oo ron oO DaHtoeod AO H OD OH mOIDAS oorn om o 
7m e OOOH AHHH Noa Ata SS0OH FEEEE SSSS OFT SANA BOSSS SSHH EES 
3 ANAAAN ANAAN ANNAN ANN AAAAAN AANAA ANNAN ANN OD OD OD 69 OD CD OD O93 03 OD ANNAN ANN 
3 Sota SD moon NANDO OO oow atoo mS tO +H 62 0 HS oS ANC to OOH are Oo “Oe 
. o BHKHH HH MAA Aas HHH FSSHS GOOH HHS ANAHS SOSSS SHH HSS 
nr ANNAN ANAAN AAAN AAN ANQANA ANNAN ANNA ANN OD 69 OD OD OD MOAADQ ANNAN ANN 
‘= 
= 
ay a — —____— _ 
= 
3 ay ADOWN moro naor HANH mONS SO +tNO@Or © 0 OO +H mi oD om OOD CoM Ot nonsc noe | 
$i | FRO NAAAN AAAs aao OHSS GOSS Wowk LH AaSSS SSSHH HHEK SS 
bid ANANAAN ANNAN AAAAN ANN ANAIAA ANNAN ANNAN ANN 09 0D OD CON ANQAAAN AAAA ANN 
zc n 
3 —— = — > Ee a 
£8 g # re 
3 Ly ° oS OS =H AaoDeO ~-noneo -on Nr~oor OOASH retor oor =] —ok-  —) rHtanoew rar Oona 
% Q a 
82] AQ | SHAAN AAAs aaa SOS B | FSSSH SOSOHK HEF OHA SSSSD YHOHE HESS wos 
3k >°o ANQAAA ANQANAA ANNAN ANN ~) ANAAAN ANNAN NANAINN ANN 3f OOANN ANNAN ANNAN Aan 
° ro) 
= 2 —~—— 2 | —— | 3 RA _-_— 
i) 
at oOoro Ax oD OOO = @O Ht eoonor WANS OD tet © 19 NI ONr- OD OItnow tA Oo Oo = 
Selo AAaas aasSS SOSS FSS SHOSH FY FS Hon aan SSOHO> SEES SOSS HHT 
. ANNAN ANNAN AAAN —— ANAAN ANNAN ANNAN ANN ANAAN ANNAN aAAAA AAN 
$ 
a3 ‘ Pes ve na ‘ as naleedeaees alee ena 
LQ 
2 
Ra Ornean S2OrS OHNS FHH ONDOH AHoOwor OMSOr HH MIM-MOBAD MVHOArS HOON WHS 
wn SSSSS SBSQBAR SPOPBRB DOH HHH MMOAAN AANA ase Or-rSS SOHHH WHat OOO 
5 § ANNAN Aes Se ee Moen eee -—e AANAASN ANANAAAN AANA ann ANAAN ANNAN AAAA ANN 
q 
> —e ee ee —— — — —— - — 
Ss 
& NASCWDOH HRMHANRE SCBOKrH OAS PHH Om MWHtMeH CHHN SOW MOHD WMNOH KrHOoOr MmMOo 
s 7 SSOHOHH HHHOD SEE CLS AAA aaa BOSS SSS SOHSSH FFF FH SOSA AAn 
Sy a == Se — ANAAA ANNAN AAAN i“ ANNAN ANAAN ANNAN aan 
i) ao. 1 aia hee oe sad } 
=) oOo con aa OOD oro Axo mt OO HOO as OD Or rtao Oo + Horan oeOr~ hs <i oom m= Oo 
© oo SOSHS O©OSHH 6oHoH 6H9 SASABD SDPHODHD BHWODH Ore AHROANNA Assss sass CAS 
3 Ce an ie eel St —— e een Mae! Area Ce ae Mees Bae eel <= es ANAAN AAAAA AAAA ase 
& 
ra = ‘ —- —— 
2 j 
SuNmMt MOCKrBDS Sex Cue NOY MOKEBDA Oo i} SCnNnat mon 
a - _ 



































wrotHoo HSCS 


12 





Sanat e@ SOND 


10 


roaon 1 Oe 
RSX ASS 
CR 69 69 OD oD OD OD 











6 OD 1D G OD Ota OD 











QUA 19 © od DIWHnao toot ONC 


1% points 





Table 1 (cont.) 





Ate ALS eoornon 


“oon oo 
HAOOA Aan 


aaa AAA 


mH oof onrt 


SSaGQ COEEF COD 








AAAA AAA 


toi Oro oO Nr AO 

















SOS OKs 
ANNAN AAN 

















» 
a 
A J 








SDD FSOH KVANS 
coon cee «6c eo 


DOrHS OOH 
ener ence «6 cD 





RESSS SHFO Aas 
OD 09 OD OD OD 6 OD 


aoone SON DD 
© 19 16 HH HONS 





0-5 % points 
0-1 % points 





Or BOD noord AaNom orm ol 








A HOO tr wont 
































452 Miscellanea 


Exact linear sequential tests for the mean of a normal distribution 


By J. TAYLOR 
Food Research Department, Unilever Limited 


1. INTRODUCTION 


Wald (1947, chapter 7) described sequential probability ratio tests for the mean of a normal distribution 
of known variance. He gave formulae for their operating characteristics and mean sample size and, 
recognizing that these were approximate, also set limits to these quantities. For tests in which the mean 
sample size is small, i.e. in which the alternative hypotheses differ greatly, the errors of the first and 
second kind are much less than the nominal values. The mean sample size is greater than the nominal 
value and, although it is well beneath that of a fixed-sample-size test with the same actual errors, it may 
be close to that of such a test with the same nominal errors. Baker (1950) used random normal deviates 
to investigate the test for one pair of alternative hypotheses. For these, the error of the first kind was 
approximately the mean of Wald’s limits and the mean sample size was approximately Wald’s upper 
limit. 

Sequential tests whose actual errors are close to pre-assigned values are desirable, and no practicable 
direct method of obtaining them has been proposed. It is suggested that they be obtained by calculating 
the errors of given tests and using interpolation. One way would be to use Baker’s approximation, but 
this would depend on results for only one pair of alternative hypotheses. Another method would be 
to use an improved approximation to the operating characteristic, due to Page (1954a, 6), and this is 
outlined in §3 of this paper. The present author has used a method based on random samples to obtain 
tests for which the errors of the first and second kind are both 0-05, and the results of his investigation 
are presented in the remaining sections. 


2. NOTATION 
We shall assume that x is normally distributed with mean @ and standard deviation 1, and that the 


alternative hypotheses are 
Hq, 0=6, 


H,: 0=6,. 


The errors of the first and second kind are to be « and f. 


3. WALD’s TEST AND PAGE’S IMPROVED APPROXIMATION 


Wald’s test consists of taking observations x; in sequence and plotting 


against m (see Fig. 1). The boundaries LZ, and ZL, are the lines 








1 B I,+4, 
i queen hy tae 
es i i 
1 1- O,+0 = 
and Va = aan ge tm 


respectively (In denotes a natural logarithm). 

The approximations in Wald’s formulae are due to the fact that many schemes end some distance 
beyond L, and L, instead of on those lines (i.e. they end on one of the full lines down from the 4,;, or on 
one of the broken lines up from the B;, and not at points A, or B,;). One consequence is that if «uations 
(1) are used to calculate boundaries with, say, « = # = 0-05, the scheme will have errors which are 
somewhat less than 0-05. 

Page (1954a) has shown how the true operating characteristic can be calculated by solution of an 
integral equation, and (19546) has discussed methods of obtaining the solution. His notation corre- 
sponds to ours if we take 0 = (0,-0,), 











the 


an 


sal 


ag 


ution 

and, 
mean 
; and 
ninal 


‘may 
riates 
| was 
ipper 


cable 
ating 
, but 
id be 
nis is 
tain 
ution 


; the 











Miscellanea 453 


and use the probabilities (as given in his Table 2) for Z = 4h. These are the values of « = f for a scheme 
with boundaries aod 
Yn = — Z +m 25 
(2) 

O,+9 
and Ym = Z+m “A . 

The table is calculated for a given value of Z and may then be used to find the value of @ for which 
a = # has a desired value. Thus interpolation in Page’s table shows that, for Z = 5, we shall obtain 
a = £ = 0-05 for 9 = 0-26, approximately. Similar calculations for other values of Z would give a curve 
connecting @ and Z for a = £ = 0-05. Tests with «+ could be investigated by taking Z + $h. 





1 A, 
co. ah Acce pt Hy 


1 2 3 4 5 

















Fig. 1. Boundaries for a linear sequential scheme. 


4. AN ALTERNATIVE APPROACH 


Another approach is to fix 0; — 0, and calculate the values of a = # for various values of Z. There is no 
loss of generality if we take 6, = 0 and the problem becomes that of determining the probability of 
crossing the upper boundary whan the x; are random normal deviates with mean 0 and standard devia- 
tion 1. Let Py; (P1;) be the probability that the lower (upper) boundary is crossed at m = 7. We can 
calculate Py, and P,, from tables of the normal distribution; also Py, and P;, using the tetrachoric series 
(Kendall, 1941), as the joint distribution of y, and y, is bivariate normal with p = ,/0-5. Further calcula- 
tion is impracticable, but recourse may be made to the use of random sequences. The first 20,000 deviates 
in Wold’s tables (1948) have been used to construct 2000 sequences of ten values (y; ... Y¥y9)- These have 
been used for several sets of (0,, Z) in which 0, is relatively large. Sequences which did not cross a boun- 
dary for m<10 were completed from the remaining 5000 deviates in the tables. If N, of the random 
sequences continued beyond m = 2 and U, of these finally crossed the upper boundary, and if 


P,= 1— Po —Pyu—Poe— Piss 
then the estimated errors of the scheme are 


A 4A U. 
a=~f= PutPptPex: 


r [bel-¥)a} 


Also if %, is the mean sample size for these N, sequences, with standard error 8,, the estimated mean 
sample size for the scheme as a whole (when H, or H, is true) is 


and the standard error of & is 


No.1 = (Po t+ Pu) +2(Po2+ Piz) + Pc%e 
with standard error P,s,. 
It may be noted that the proportion of sequences crossing the boundaries for m = 1 or 2 always 
agreed, within limits of sampling error, with the calculated values. 














454 Miscellanea 


5. RESULTS OBTAINED BY THE SAMPLING METHOD 


Table 1 gives the principal results. In order that efficiencies can be compared, the sizes n,; of fixed- 
sample-size tests with the same «, § and @, are given as calculated, without rounding-off to the next 
highest integer. 

For each value of 0, the probits of the values of & are approximately linearly related to Z. Inter- 
polation gives the values Z (in Table 2) for schemes having a = £ = 0-05. The standard errors of the 
Z cannot be calculated, because the errors of the & for different schemes are not independent (as the same 
random sequences were used for all schemes). However, we may be confident that the standard error of 
the significance level for each Z is under 0-005 and so the true significance level for each of the schemes 
given will be between 0-04 and 0-06. This degree of approximation is mu::h less than in Wald’s scheme. 


Table 1. Characteristics of some linear sequential schemes 




































































| | | | 
| Significance level | Mean sample size | 
| Effici 
6 Z “s ciency | 
' | | é (n,/Mo,1) 
a | S.E No.1 S.E 
| 
| 
1-00 1-75 00988 | 0-0054 4-14 0-05 6-64 1-60 
2-00 0-0798 | 0-0052 4-71 0-06 7-91 1-68 
2-25 00-0647 | 0-0049 5-33 0-07 9-20 1-73 | 
2-50 0-0481 | 0-0044 6-03 | 0-08 11-07 1-84 
| | 
1-25 1-50 0-0776 0-0043 315 | 0-03 5-17 1-64 
1-75 0-0593 | 0-0041 3-63 0-04 6-24 1-72 
2-00 0-0439 | 0-0038 4-15 0-05 7:44 1-79 
1-50 1-00 0-0863 | 00031 | 2-06 0-02 3°31 | 1-61 
1-25 00643 «=| «40-0032 2-44 0-02 411 | 1-68 
1-50 0-0477 | 0-0032 2-83 0-08 4:95 | 1-75 
| | _| 
Table 2. Characteristics of interpolated schemes 
| seas i 
6 Z Zw Z -Z, | | n ne S.E. 
1 | 0.1 | f | (r%//M0,1) max. 
| | 
| | 
1-00 2-47 | 2-94 — 0°47 593 | 10-82 1-82 9-18 6-32 
1-25 1:90 | 2-36 — 0:46 3-92 6-92 1-77 6-45 0-19 
1-50 1-46 1-96 —0:50 | 2-76 481 | 1-74 4-43 0-12 
| 
] 
Table 2 also gives values a, = —: 
1 


calculated from Wald’s equations (1) to correspond, for the given 0, and a nominal « = £ = 0-05, to the 
Z in equations (2). The differences Z — Z,, show no regular trend and have a mean of — 0-48, so it is recom- 
mended that for other values of 0, in this range we should take 


~ Inl9 
Z= -—— — 0-48. 
A, 
The values no, have been interpolated from Table 1. They show that the efficiency of the schemes is 
good. The mean sample size when neither H, nor H, is true is also important, and will be a maximum 


(%max.) when 8 = 40,. For each 0, considered, 500 random sequences were used to obtain the values of 








max 
confi 
than 


It is 
it is: 
size. 
Ym Vv 
a tes 

T 
the « 
at v' 











the 
om - 


S is 
um 
s of 





Miscellanea 455 


Nmax, and their standard errors. It will be seen that they are less than n,. Thus these schemes may 
confidently be recommended, when they can be applied, as always having lower mean sample sizes 
than fixed-sample-size tests. 

6. TRUNCATION 


It is a disadvantage of sequential schemes that they occasionally require very large sample sizes, and 
it is sometimes recommended that sampling should stop at two or three times the expected mean sample 
size. For the test given in this paper it would then be reasonable to accept Hy or H, according to whether 
Ym was less than or greater than }m0, (i.e. above or below the mid-point of A,,B,,). This would give 
a test with a = p>@. 

The mean sample size (n,, say) would be less than 79 ,, but the increase in « and # reduces the size of 


the equivalent fixed-sample-size test. Table 3 shows the effect of truncating a particular linear scheme 
at various values of m. 


Table 3. Effect of truncation for 6, = 1-00, Z = 2-50 

















| | 
| | Efficiency 
m a(= n | n 
| ( Bp) | t | t (n;/n,) 
| 
far? $a Sere tee ‘iol preere: 
6 | 0-1244 | 4-50 5-32 1-18 
9 | 0-0793 | 5-32 7-95 1-49 
12 | 0-0654 5-70 9-13 1-60 
15 0-0530 | 5-88 10-45 1-78 
0 | 0-0481 | 6-03 | 11-07 1-84 
| 
! | 





Thus truncation below m = 15 causes an appreciable increase in « and loss in efficiency. Similar 
calculations have been made for the other values of 6, and Z shown in Table 1, and it has been found 
generally true that truncation at 2-5n, , is satisfactory, but not at lower values of m. 

An alternative rule for action at the point of truncation would be to accept H, if the upper boundary 
had not been crossed, i.e. a path reaching any point on A,,B,, would result in the acceptance of Hy. 
This gives a guarantee that a is less than 2, but § becomes greater than @ and the efficiency is appreciably 
less than for the symmetrical truncation considered above. 


7. SUMMARY 


Wald’s test for the mean of a normal distribution with known standard deviation involves an approxi- 
mation which is poor when the alternative hypotheses differ greatly. The characteristics of some linear 
sequential tests have been calculated, and it is found that they have the high efficiency customary in 
sequential probability ratio tests. Some schemes for which the errors of the first and second kinds are 
0-05 have been deduced. The truncation of such schemes has also been investigated and it has been 
concluded that this should not be done at less than 2-5 times the mean sample size on the null hypothesis. 

Attention is drawn to a method, due to Page, of calculating the operating characteristic of Wald 
schemes. This provides another method of calculating schemes with prescribed errors. 


The author thanks the directors of Unilever Limited for permission to publish this paper. 
He also thanks Miss R. M. Stimson, who did most of the computing involved. 


REFERENCES 


Baker, A. G. (1950). Properties of some tests in sequential analysis. Biometrika, 37, 334-46. 

KENDALL, M. G. (1941). Proof of relations connected with the tetrachoric series and its generalization. 
Biometrika, 32, 196-8. 

Pacu, E. S. (1954a). An improvement to Wald’s approximation for some properties of sequential 
tests. J. R. Statist. Soc. B, 16, 136-9. 

Paar, E. 8S. (19546). The Monte Carlo solution of some integral equations. Proc. Camb. Phil. Soc. 
50, 414-25. 

Watp, A. (1947). Sequential Analysis. New York: John Wiley and Sons, Inc. 

Wo tp, H. (1948). Tracts for Computers. No. XXV. Random Normal Deviates. Cambridze University 
Press. 








456 Miscellanea 


On the sum of squares of normal scores 


By H. RUBEN 
Statistical Laboratory, Manchester University 


Let z,\, (r = 1,2,...,) be the rth order statistic, in ascending order of magnitude, for a sample of n 
items drawn independently and at random from a standardized normal distribution. We shall obtain 


n 
an explicit expression for S, = & as n> Where a, |, is the expected value of 2, ;,. The quantity S, is of 


r=1 
some importance in the analysis of ranked data and has been tabulated by Fisher & Yates (1953). It 
will appear that S, can be expressed in terms of the contents of regular hyperspherical simplices with 
primary angles cos—! ( — 4) or equivalently in terms of the expectations of a set of extreme order statistics. 
Define 


n—l1\ [(? 
arin =n ) I op q*“sdxe (n= 1,3,...3;% = I, 2, .2-5), (1) 
r—l =<@ 
where z= aan e~ te? (2) 
(27) ; 
¥ 4E* dE 3 
Se e~ , ( 
sj | me , 
ee ag 
a e~ *d > (4) 
° aaah 
n 
and 5,= 3B ane (5) 
r=1 , 


It is known (Ruben, 1954) that a, | , can be expressed in terms of the contents of hyperspherical simplices, 
constructed on the surfaces of spheres immersed in a space of dimensionality n — 2, with primary angles 
either cos! (— 4) or 7—cos~!(— 4). In the particular case r = 1 or r = n the simplices are regular, all 
primary angles then being cos~1(— 4). 
Let u-™ and u™ denote the ascending and descending factorials, respectively, of degree m in 4, i.e. 
uO = ], 
u-—™ = u(ut+1)...du+m—1) (m=1,2, une 
uy = 1, } 
uw™ = ulu—1)...(u—-m+1) (m=1,2,...), 
and let Q;, ,(7) denote the Tchebycheff—Hermite orthogonal polynomial of degree k in r for r = 1, 2,...,7, 
which has been normalized to make the sum of squares of the values of each polynomial unity. Thus 


_ (hI (+ 9)'(n—J—1)t_ (r—1) 
(2k)!(7!)? (k-j)!\(n—k—1)! 
= 0,1,....8#~-15nH = 1,2,...37 = 1,2 ..:5%) (8) 


(6) 


(7) 


k 
Qx, nl7) = Kun >» (—)*-3 
j=0 





1(2k ! 4 
he Bon [iran 1) " 
and E Qual Qale)=1 = mt os 
=0 (l+k) 
From (9) Sn = (Qo, n(1) 21) n+ Qo, n(2) Gain +--+ +Qo,n(M) Ani n)?+--- 
+ (Qn—1, n(1) 1p n+ Qn-1, n(2) Ag) n+ ++ + Qn-1, n() On} n)*- (10) 
Now 4, | » = —@n-++41|n (this follows readily from (1)) and furthermore for even 7, Q;, n(x) is even about 


x = 3(n+1). Equation (10) may therefore be written in the form 
S, = (Q4, n(1) yj n+ Qt, n(2) Gai n+ +e +Q1, nl) Onn)? 
+(Q5, n(1) 4: j n+ Qs, n(2) ej n+--- +93, n(%) Onn)? +--- 
+ (Qin, n(1) Oy) n+ Qm, n(2) Ge) n+ ++» +Qm, nl) An) n)*, (10’) 











th 


Ey 





out 


10’) 





Miscellanea 457 


where m is n— 1 or n—2 according as to whether n is even or odd. To evaluate the separate terms on 
the ‘en hand side a equation (10’), note that 


(=)F (2 (+9)'(m—-J-D! n—1) [® 
= — (r—1)\A ap’—lq"-"z dx 
5 Oalraria= Po 2 Box apie awe O—Mm(e_y) | se taeede 
Bi-P- eae one =% 
= nK a het RS. Shea 1) r—-lgn-r 
man 2s (BRIDE EJ) (n—k— DD! Ze-n (; “en ]=« 
ke (—)*-4(k!)? (kh +9)! (n—j —1)! ) mJ 
Po ee aa Ml teat tains 
K i KE (—)*-4(K +5)! (1)? 
= 7 n > . 
6") co jo (2k)! (k—J)!(5!)? 
K (— JF (+9)! (et)? 
*" jo (2k)! (kJ)! (9!)" 
where u;_,(cos-1— 4) is the relative content of a regular hyperspherical simplex constructed on the 
surface of a sphere immersed in (j — 1)-dimensional space, the angle between any two bounding, diametral 
eo 





Tt 





=nK;,, n 





(n— 1) pixzdx 





‘ 
my a -1_ 
=nK (n—1)3 Tem) “-10008 4), (11) 


planes being cos~! ( — 4). The relationship between integrals of the form 2? dx, and the w’s has been 


demonstrated in a previous paper in which values of u,,(cos—1 1/x) have bien tabulated for « = 2(1)12 

and for various values of m (Ruben, 1954, Table 1). Also, in the derivation of (11) use has been made of 

the fact that the sum of the series in the square braces is the jth factorial moment of a random variable 

which has a binomial distribution with index n — 1 and probability p; this factorial moment is (n — 1) p/. 
On substitution of K,, ,, in (11) and squaring, 








n ¢ 2 (n- 1)* 
(= Qx, n(7) a, | , =n (n—1)-* (12) 
2k+1/& (k+7)? : 
where te rr (2-9 iG-pi” (cos? — -») (= 6;4; 53); (13) 
the even «’s being all zero, as implied in the remarks made immediately after equation (10). 
Hence, finally, S, is expressed as a linear function of the first [4] odd @’s: 
= 2 t=4 (n—1)(n—2)(n—-83) 
gp iat oe FOnia) = + as y(n +2) (048) _— 
Evidently since lim * (al, at---+as jn) = E(x?) 
n—>co 2 
=1, (15) 
x being a standardized normal variate, we must have 
ay t+Astas+...=1. (16) 
We note that only the first two of the odd @’s are expressible in terms of elementary functions. In fact, 
since u(9)=1, ,(0)=4, w(0) = 6/27, 
3 
a= , 
63 cos~!(— 4) ? 
= — | 10 ————- - 3} . 
Os 7 ( 27 


The reason for this is that «; involves the relative content of an arc of a circle (subtending an angle 
cos—! (— 4) at the centre); on the other hand, a;,«,,... involve the relative contents of regular spherical 
tetrahedra, pentahedra, etc., and these cannot be expressed in terms of elementary trigonometric 
functions (cf. Ruben, 1954). The values of the first six odd «’s are given below: 


ay 





| 
0-954929658728 
0-033492010690 
0-00667477593 

7 0-002278473 

9 | 0-0010182 

1 | 0-00058 








458 Miscellanea 


Fisher & Yates (1953, Table XX1) present S,, to 4 decimal places. These values were computed from the 
values of a,;,, given to 2 decimal places in Table XX, by squaring and adding. An upper bound to the 


n 
absolute error of the tabulated S,, is therefore* 2 x 0-005 x & |a,;,|~0-008n (about), since 
r=1 
: l 
lim — (lyin | +--- +] Gn) al) = E(|x|) 
n>o 2 
ae 
ar i 


x being again a standardized normal variate. Thus S59, given as 47-3830 in the tables, may be in error 
by as much as 0-4, i.e. by somewhat less than 1%. 

The values of the a’s tabulated above allow the computing of S,, for values of n from 1 to 13 inclusive 
with great precision by means of equation (14), the last term in the right-hand member of (14) involving 
ay, for n = 12 and n = 13. Upper bounds for the errors, induced by the errors of the tabulated @’s, in 
S,, and S,, are 6-1 x 10-5 and 6-6 x 10-5 respectively, and are very much less for lower values of n. Thus 
the upper bounds for the errors in Sj) and S,, are 5-0 x 10-7 and 5-5 x 10-? respectively. 

For values of n> 13 higher order @’s are required if all the terms in the series of (14) are to be used. 
However, this is not strictly necessary, since high accuracy may be obtained by including terms as far 
as that involving «,, only. Fisher & Yates’s tables may thus be extended to cover values of n> 50. 
The error in S,, now consists of a part due to the non-inclusion of terms involving @3, &5,-.. as well as 
(just as for n < 13) of a relatively small part due to the errors of the tabulated values of «,,...,a@,,. An 
upper bound to the total error for S59 is 0-05, while for Sjo9 it is 0-10. More generally for moderate or 
high values of n it is about 0-1 % of the value of S,. 


REFERENCES 


FisHer, R. A. & YaTEs, F, (1953). Statistical Tables for Biological, Agricultural and Medical Research. 
4th ed. Edinburgh: Oliver and Boyd. 

RuBEn, H. (1954). On the moments of order statistics in samples from normal populations. Bio- 
meirika, 41, 200-27. 


On the moments of the range and product moments of extreme 
order statistics in normal samples 


By H. RUBEN 
Statistical Laboratory, Manchester University 


It has been shown in a recent paper (Ruben, 1954) that the moments of order statistics in samples from 
normal populations are expressible as linear functions of the contents of certain hyperspherical simplices. 
This note will demonstrate that the product moments of the extreme order statistics in samples with an 
even number of items from normal populations are expressible as linear functions of the products of 
the contents of these simplices (equation (4)). The odd moments of the range when the number of items 
is odd, and the even moments when the number of items is even, may be expressed in a similar manner. 
The joint probability density function of the extreme order statistics in normal samples of size 7 is 


n(n —1)[F(v)— F(u)]"*f(u)f(v) (vu), (1) 
are r 
where f(u) = Jan° _ ad (2) 
Tein eee ‘ ele" dé. (3) 
(27) —2 


* Actually, it is very slightly less for odd n, since a, , is accurately zero when r = 4(n + 1) and n is odd. 


Fen 


wh 
the 
on 1 


Th 


vel 








E 


OI 





she 
she 





Miscellanea 459 


Hence, for n even, 


E(uv) = n(n—1) in [F(v) — F(u)]" f(a) f (ve) uvdudv 


v>u 


= amin) fo ,4 [LF (v) — F(4)]" fu) f(v) uvdudv 


= In(n—1) 'y ihe "= -(" 1, ) RCo PON **/ (a) fo) wodude 


0 
— wo is @] 
= §nte~1) = ae (" ) | ul F(u)]*f(u) du | vf F(v)]"-2-# f(v) dv 


1 act in—2 
= —n(n—-1) & (—-)* k(n —k — 2) u,_,(cos—! — 4) uyn_,_3(cos-1— 4) (nm = 2,4, 6,...), 
87 k=0 k 
(4) 

where w,,(9) is the ratio of the content of a regular (hyper) spherical simplex, with primary angles 0, to 
the content of the surface of a sphere immersed in m-dimensional space, the simplex being constructed 
on the surface of such a sphere. In the derivation of equation (4), use has been made of the result (Ruben, 
1954) that the expectation of the upper extreme statistic in a normal sample of size t is 


if uLF(u)]' f(u) du = el. —) @=2,2)...3 (5) 
= 2/7 


The relationship between the present w’s and the %’s of the previous paper is w,,(cos~! — 1/a) = %,,(x). 
Equation (4) may be written in slightly simpler form, as follows: 





—1)n-3 tal 
E(uv) = ae "5, (-)* r 7 *) Min~ Sty, tee Bee ree 
_ n(n — 1) "=3 (n—2)! 
ae 2 (gayi ke 3)1 > Uz_3(COS! — 4) u__~_-3(Cos~! — 4) 
‘n-1 2)in—3)” _ 
Be n er ye n ME (- )* ) wa( 08-3) tpg (eos }) (n - 2, hdl 


(6) 
The terms in the series of equation (6) corresponding to k and to n—4—k are equal so that yet another 
version of equation (4) more adapted to ey, is 

_n(n— 1) (n - 2) (n—8) 3” 

4n 
n(n—- 1) (—)h n— : 
87 4(n—2) 

A similar argument may be used to derive E(v—u)”, the moments of the range w,, based on a sample of 
size n from normal populations, when m and n are both odd and when m and n are both even, since the 
integrands defining these moments will then be symmetrical in « and vas in the derivation of equation (4). 


The moments themselves may then be expressed as sums of products of contents of simplices whose 
angles are either 0 or 7 — 6 where 


0 = cos-'(—4), cos—{ —}), ..., cos! (-5) . 


an 
E(uv) = = " (" . ‘) Uy, (C081 — }) tpg, (COS! — }) 


Wan peor (nn. 3, 4,...); (7) 


We shall not derive the relevant expressions here but shall instead use (4) to obtain the second moment 
only of w,, for even n. We have 


E(w?) = E(v—u)? 
= 2E(v?) — 2H (uv) 


n(n — 1) (n— 2) “A. 
Sg tn-a(008*— 2) 
—1)(n- * -4 
;= sd Se Oy |. mt; ) wu(e08-1— 4) ty 4 (008-1 J) (n= 2,4,...), (8) 
7 k= =0 k 


on using equation (101) of an earlier paper (Ruben, 1954) to express H(v*) in the form used in equation (8). 











460 Miscellanea 


It should be noted incidentally that for all n 
E(w») = E(v—u) 

= 2H (v) 
_ n(n—1) 

“i 

on using equation (101’) referred to earlier. 
Equations (8) and (9) may be used to devise explicit expressions for the variances of w., w, and w, in 
terms of elementary trigonometric functions. Similar expressions for the variance of w,, whenn = 8, 10,... 
are not possible, since the formulae for these variances involve the contents of spherical tetrahedra, 
pentahedra, etc., and these cannot be expressed in terms of the elementary trigonometric functions 


(cf. Ruben, 1954, 1956). On the other hand, E(w%), as opposed to var w,, can be expressed in the desired 
form. In fact, 





Un-(cos*—}) (n= 1,2,...), (9) 


E(w,) = Fg waleost— }) 








2 10 
ce (10) 
12 
E(w,) = yer’? 
_ 12 cos~!— cos~?—} 
~ it — > (11) 
30 
E(w.) = Failte vin 
E(w:) = 2, i 
: 24 24 
E(w}) = ae el }) +7 wales — i) 
= tate 14 
oT ays a (14) 
2) 120 360 
seid’ * memaiee 1) += (2ug(cost — }) tuq(cos-? — +) — 2u8(cos-1— })} 
120 3(cos~* — $) oar» 360 cos-!—4 
‘oe ae re {2 120 
15 45 15 
= eS EE es — vo - 
=2 mT at qa W3cos-?(—t) + 6cos-?(— })}, (15) 
7 
varw, = 2-7, “oe 
/ 
var, = 24+°+=2>—2(cos-1(—4))8 (a7) 





Work is currently proceeding on the Manchester electronic computing machine for the evaluation of 
u,(x) and V, ,(x) (the latter are related to the contents of skew spherical simplices, @,(x)= Vp, ,(:c) 
pn ang 1954)), at least as far as n = 100, x = 6 and for all possible 7. At the same time the first four 
moments of all order statistics in normal samples of size n < 100 are being computed. 

It is also hoped shortly to compute the product moments and hence the covariances of all order 
statistics for normal samples over the same range of n. 


REFERENCES 


RuBEN, H. (1954). On the moments of order statistics in samples from norraa] populations. Bic 
metrika, 41, 200. 
RusBeEn, H. (1956). On the sum of squares of normal scores. Biometrika, 43, 456. 








th 
on 
su 


an 





14) 





Miscellanea 461 


On estimating binomial response relations 


By F. J. ANSCOMBE 
Statistical Laboratory, University of Cambridge* 


Berkson, in numerous papers, has rightly stressed the preferability of assuming a logistic form of dose- 
response law to assuming an integrated-normal law: (i) the logistic law has, in some applications at least, 
some sert of theoretical justification, while the integrated-normal law has none, apart from a wholly 
speculative argumert about a distribution of individual tolerances; (ii) the shapes of the two types of 
response curve are so similar that only a very extensive series of observations could hope to show any 
appreciable difference in goodness of fit; (iii) simple sufficient statistics are available for the parameters 
of the logistic law. 

The objects of this note are (a) to point out that fitting a logistic law by maximum likelihood is not 
as laborious as Finney, in several publications, has made it out to be, and (6) to examine Berkson’s 
method of fitting by ‘minimum logit y?’, suggesting a small modification of it. Remarks are also made 
about testing the adequacy of the assumed logistic form of law, and about fitting the ‘angular’ response 
law. 


1. NovTatTion 


Groups of subjects are tested at each of k different values x; of a predetermined variable (i = 1, 2, ...,). 
In toxicology, x; would be the ith dose expressed logarithmically; and it will be convenient here to 
refer to the x’s as ‘doses’. Of the n,; subjects tested at dose x;, r; are observed to respond. It is assumed 
that the r; have independent binomial distributions with parameters n; and P; (= 1—Q;), where P;, is 
a given function of unknown parameters « and / requiring estimation. The logistic law is that 


P, = [1+exp(—a—fx,)]",) 
or In (P;/Q;) = «+ fx;. 
It is also convenient to write a+ fx; = B(x;—), so that w (= —a/f) is the dose for which P = }. 

Often there are two or more such sets of observations, and then it may be appropriate to assume that 
the parameter # is the same for each set. For simplicity of discussion, however, it will be supposed now 
that only one series of observations is being considered. 

Since all symbols for quantities other than the parameters «, f, 4 carry the suffix 7, it will be convenient 
to omit the suffix, noting that all summations are over? = 1, 2,...,/. Estimates of a and £ will be denoted 
by and f, and values of P and Q obtained from (1) when « and f are replaced by & and B will be denoted 
by P and Q. 


(1) 


2. FIrrinG THE LOGISTIC LAW BY MAXIMUM LIKELIHOOD 

Since Xr and Urz are sufficient statistics and simple to calculate, it seems reasonable to use them. The 
maximum-likelihood equations for & and Bf are (as Berkson has pointed out) 

ar= InP, Ure = UnxP. (2) 
They can be solved iteratively as follows (cf. Cornfield, 1954). Given trial values of & and B, calculate 
the right-hand sides of (2) and also the estimated weights W defined by 

W = nb. (3) 
Corrections 6% and 8B to the trial values are then found by solving the equations 
(ZW) 62+ (Wa) 8p = Er—InP, 
(ZWe) ba + (2We?) 68 = Irx—- inxP. | 

i» * sether iterations it is unnecessary to recalculate the coefficients on the left-hand sides of (4), but only 
the right-hand sides. This procedure is to be compared with Finney’s (e.g. 1952), whichis exactly modelled 
on the ingenious procedure due to Fisher and Bliss for fitting the integrated-normal law, for which no 
such sufficient statistics are available. 


Under the conditions that all the n’s are equal, and that the 2’s are equally and not too widely spaced, 
and cover such a wide range that PQ may be assumed effectively to vanish outside it, the sums on the 


(4) 


* Present address: Department of Mathematics, Princeton University. 











462 Miscellanea 


right-hand sides of (2) may be replaced by integrals, with the aid of the Euler—Maclaurin formula. After 
integration by parts we obtain 


pp = x, +4h—-h2r/n, } (5) 
pe? +7? /(382) = (ay, + 4h)? —psh? — WhEUrx/n, 
where x; is the greatest dose and h is the spacing between adjacent doses. The first of these equations is 
the Spearman—Karber formula for estimating y, the full efficiency of which was noted by Cornfield & 
Mantel (1950). The left-hand side of the second equation is approximately fe +3: 290/22. 


3. ‘MrInImMuM LoGitT y?’ (Berkson, 1953) 


For graphical treatment it is natural to consider the transform of r by the inverse of the logistic function, 
namely, sc 
t= hn—-. (6) 
n—? 
This may be expected to show an approximately linear regression on 2. Berkson has proposed a non- 
iterative method of fitting which consists of choosing & and B to minimize 


=Ww (l-&— fx)’, (7° 
where the W’s are empirical weights defined by 
W =r(n—-r)/n. (8) 


(The procedure is modified a little, however, if there are any doses for which r = 0 or n.) It is easy to 
see that in large samples this is equivalent to the maximum-likelihood method of: fitting. ‘In large 
samples’ means that, the x’s being fixed, all the n’s > oo. 


Now as an estimate of « + x the l defined by (6) is biased, but the bias can be very nearly removed by 
redefining / thus: 


r+4 
n—r+}° 
Moments of the asymptotic distribution of this 1 may be found by expanding the right-hand side of 


(9) as a Taylor series about the value nP for r, raising it to a power, and then taking expectations term by 
term. We find in this way 


i=in 





(9) 


1 2(P—Q) 
= ~2 Abe ta Be aM ARES! . fh 10 
&(l) = a+fhx+O(n-), var (l) PQ’ yi(l) nPQ) (10) 
The last of these results implies that the skewness of the distribution of | is asymptotically twice that of 
the binomial distribution of r and is of the opposite sign. 

It seems, in fact, that we may accept the asymptotic mean and variance as good approximations, 
provided n is fairly large and P is not very close to 0 or 1. Since, for given , J cannot lie outside 
+In(2n + 1), Lis valueless as an estimate of « + fx if P is sufficiently close to 0 or 1, i.e. if |w—p| is large 
enough. For any numerical values of n and P, it is possible to calculate &(1—a—x) and var(l) by 
summation over the binomial distribution of r. If n is not too small, it will be sufficient to consider the 
values for given nPQ, in the limit n > oo, and either P or Q > 0. The formulae needed are then 


+ &(l—a— Px) = e- ra {nPQin3+ 5 (nPQ)#In 5 +5  (nPQ)?In 7 + .. \- In (2nPQ), (11) 
where the upper sign on the left-hand side refers to P +0, the lower one to Q > 0, and 
1 2 
var (l) = e-"P@ {nPodn 3)2+ Bs (nPQ)? (in 5)? + | —e-2nPQ (ne n3+ ry (nPQ)?in5+.. | ~ (12) 


Some values obtained are shown in Table 1. The bias is positive if P is close to zero, negative if Q is close 
to zero. Bias and variance are to be compared with the variance 1/nPQ assumed ir the least-squares 
fitting. It will be seen that when nPQ = 2 the bias is slight and the variance is close to 1/nPQ; while if 
nPQ = } or less, the mean-square error is much less than 1/nPQ but is largely due to the bias. If there 
are two or more doses for which nPQ < 1, the observations may well do more harm than good if they are 
given positive weights in the least-squares estimation of a and f. 

It seems to me advantageous to try to eliminate the bias in 1 by using the definition (9) rather than (6), 
if the method of least squares is to be used, since that leads us to calculate (weighted) sums of the I’s. 
Suppose there are only two doses, and the corresponding values of nPQ exceed 2. Then provided the 





mam im A eee. ee 





12) 


ose 
res 
e if 
ere 
are 


(6), 
l’s. 
the 





Miscellanea 463 


weights W that are assigned are positive, their values are immaterial. The fitted regression line passes 
through the points (x,,1,) and (2, l,), and @ and B are unbiased. If now there are three such doses, roughly 
equally spaced, the fitted regression line will depend on the weights, but not much; it will make little 
difference whether empirical weights (8) are used, or fitted weights (3). 

The precise definition of the weight function (so long as it approximates to nPQ) only begins to matter 
if (a) there are several doses for which either P or Q is very small, or (b) there are a considerable number 
of closely spaced doses for which neither P nor Q is very small. If (a), the bias in 1 when nPQ is smali 
will matter if W + 0; if (6), a purely empirical weight such as (8) will introduce a bias, the extent of which 
will depend on how close the doses are. It seems likely that no purely empirical weight function such as 
(8) can be satisfactory for all possible sets of doses, though it may well be that the procedure explained 
in detail by Berkson (1953) (and not fully explained here) is satisfactory for all ordinary use where the 
doses are roughly equally spaced and neither too close nor too far apart. 

To guard against the effect of unusual spacing of doses, I have formulated the following recom- 
mendation (or rather, two alternative recommendations), which should yield estimates of x and # 
that are almost unbiased and of minimum variance. 


Table 1. Moments of the distribution of | defined by (9) 














nPQ Magnitude of bias Variance | Mean-square error 
ae | 
2 0-026 0-473 | 0-474 
V2 0-078 0-515 0-521 
1 0-169 0-505 | 0-534 
4 0-484 0-385 0-619 
| } 0-950 0-240 | 1-144 











SUGGESTED MODIFICATION OF BERKSON’S METHOD. Defining l by (9), minimize the sum of squares (7), 
where either W is defined by (8) or W is replaced by a fitted weight W equal to nPQ if that exceeds 1 and 
equal to 0 otherwise. 

The empirica) weight (8) would be used in cases where it is judged that the weights do not matter much 
anyway, as just explained. Otherwise, the fitted weight would be used; that involves the possibility of 
further iterations, but the process will converge quicker than the method of maximum likelihood because 
the iteration only concerns the weights, not the ordinates. The suggested critical value of 1 for nPQ 
is, of course, rough-and-ready—any value between 2 and } might be satisfactory. Unless there are at 
least two doses for which positive weights may confidently be assigned, the method must be considered 
to have broken down, and no estimates should be derived. 

Whether the above procedure is preferable to the method of maximum likelihood is a question that 
cannot be answered without reference to the object in view. Granted the assumptions of independent 
binomial observations and a logistic response law, the fullest possible summary of the information 
contained in the observations is provided by the whole likelihood function. For some purposes, it 
suffices to quote merely the position of the maximum of the function, with the values of the second 
derivatives there. Berkson’s procedure may be valuable as a substitute for this, or as a step towards it. 
Since, however, the assumptions cannot be asserted absolutely, a doubt concerning their appropriateness 
must always lurk; and graphical presentation of the data by plotting / against x has value as a check, 
apart from any approximation to the maximum-likeiihood estimates. 


4. TESTING GOODNESS OF FIT 


When the question of goodness of fit of an assumed response law arises, it has been the practice to 
calculate a goodness-of-fit y? (e.g. Armitage & Allen, 1950). This is rather unsatisfactory, since x? will 
be sensitive to departures from the binomial distribution of responses at any dose, and often in practice 
one cannot be confident that the binomial distribution is very close to the truth. In any case, y? does 
not indicate the direction of departure. 

A better procedure, for testing the adequacy of the logistic law, would be to estimate parameters /, 
and £, assumed small, where 


In (P/Q) = B(x — #) + B(x — p)* + Bax — pH). (13) 








464 Miscellanea 


The appropriate statistics, besides Lr and =rz already used in fitting the logistic law, are Xrx? and Irx* 
(or similar expressions with Wl instead of r). There is not much chance of detecting anything interesting 
unless there are a considerable number of series of observations (in the field of research being considered), 
and for each series the doses satisfy the conditions mentioned above for the Spearman—Karber method 
to be valid—especially the condition of covering a wide range. It is then easy to estimate f, and 
for each series, by equations similar to (5). 

If £,+0, the response curve of P against 2 is not antisymmetrical about x = yw. This situation might 
be remedied by a transformation of the #-scale. If 8, = 0 and £, +0, the logistic law is not satisfactory, 
but possibly one of the other suggested laws might be (see Finney, 1952, chap. 17). If estimates of /, 
and #, are available from a number of assays, their signs, rather than their values, will be of particular 
interest. 


5. Firrinc THE ANGULAR LAW 


The only other law besides the logistic which seems at present to be worth considering is 


P=sin?(«+/fx) if wee} fui 

=0 otherwise. (14) 
Here the weight is constant (and positive) for «+ fx in (0, 47) and zero elsewhere. For this law, as for 
the logistic, the transform of r by the inverse response function can be modified so that it is asymptotic- 
ally unbiased, whatever the value of «+x in the range (0, 47). The modified transformation, corre- 
sponding to (9) above, is (Anscombe, 1954) 


y=sin /7~*, (15) 


Moments, corresponding to (10), are 


1 P-Q 
Ey) = a+Px+O(n-), var(y)~ a vily)~ 2 J(nPQ)’ (16) 
The device of adding equal constants to r and n —r so that the transformed quantity is approximately 
unbiased, for all a in an appropriate range, is possible only for response laws of a certain mathematical 
type, such that 2 can be expressed as an integral with respect to P of a negative power of P(1—P). The 
device is not therefore available for the integrated-normal response !aw. 


I have taken advantage of helpful comments by Dr P. Armitage, Dr J. Berkson and Dr J. Cornfield 
in revising the draft of this paper. 


Added in proof. In §3 above it is pointed out that (i) there seems to be no particular reason why 
the transformed quantity 1 should be defined by (6) rather than by some less obvious expression 
such as (9), (ii) the properties of 1 can be, and need to be, investigated, and (iii) if nm is large the 
definition (9) is superior to (6), and seems satisfactory. Nothing has been said about the minimum 
logit x? method when n is small; the results given might be expected to apply if n= 50, but hardly 
if n=5. Even if n is large, (9) is not the best possible definition; Professor J. W. Tukey has pointed 
out that (9) can be improved by adding } to the right-hand side of (9) when r=n, and subtracting 
4 when r=0, so that the range of values assumed by / is widened a little. No doubt by numerical 
study it would be possible to improve substantially on (9) when n is small. 


REFERENCES 


ANSCOMBE, F’, J. (1954). Comments on a paper by R. A. Fisher. Biometrics, 10, 141-4. 

ARMITAGE, P. & ALLEN, I. (1950). Methods of estimating the LD 50 in quantal response data. J. Hyg., 
Camb., 48, 298-322. 

Berkson, J. (1953). A statistically precise and relatively simple method of estimating the bio-assay 
with quantal response, based on the logistic function. J. Amer. Statist. Ass. 48, 565-99. 

CorRNFIELD, J. (1954). Measurement and comparison of toxicities: the quantal response. Statistics 
and Mathematics in Biology, pp. 327-44. Ames: Iowa State College Press. 

CORNFIELD, J. & MANTEL, N. (1950). Some new aspects of the application of maximum likelihood to 
the calculation of the dosage response curve. J. Amer. Statist. Ass. 45, 181-210. 

Finney, D. J. (1952). Statistical Method in Biological Assay. London: Griffin. 











———— 





16) 





Miscellanea 465 


Existence and uniqueness of a uniformly most powerful randomized 
unbiased test for the binomial 


By A. A. BLANK 
Department of Mathematics, University of Tennessee 


INTRODUCTION 


Tocher (1950) applied the Neyman—Pearson theory of testing to discrete variables. In an example he 
pointed out the fact that among inbiased tests for the binomial those which are most powerful possess 
a certain special form. The questions of uniqueness and existence were left open. 

Let @ denote the probability of success in a binomial trial. The problem is to test the hypothesis 
6 = pe(0,1) against the alternative 0+ p by performing a sequence of n binomial trials. Let X be a 
random variable denoting the number of successes in the v trials. A test is equivalent to the choice of an 
acceptance criterion 0<y(X)<1 defined so that the hypothesis 0 = p is accepted with probability 
yr, = w(v) whenever v is the number of successes (v = 0,1, 2,...,n). The probability of accepting the 
hypothesis in the event of its truth is then 


2 vvbP) = 1-a, (1) 


where 6,() is the binomial term (") p’(1—p)"-” and @ is the size of the test. Such a test is said to be 


unbiased if the probability of accepting the hypothesis when false is less than the probability of accept- 
ance when true, that is, 


> y,b,(0)<1-« (0<0<)). (2) 
v=0 


A test is said to be uniformly most powerful if simultaneously for all alternatives 0+ the probability 
of falsely accepting the hypothesis is minimized. Tocher showed that if a uniformly most powerful 
test exists among unbiased size « tests it must have the form ’ ; 
v,=1 for s+l<v<t—1,\ 

Y,=c, Wr=d, (3) 
wv, =0 otherwise. 


It will be shown that among tests of this form there exists a unique unbiased test. Since Tocher’s 
analysis may be applied directly to show this is most powerful among unbiased tests the proof will then 
be complete. 


PROOF OF EXISTENCE AND UNIQUENESS 


Let us assume that a test (3) is given. We seek a value 6 = p which minimizes the size ~ and maximizes 
the acceptance function 


n t-—1 
P(O) = Xyyb.(O) = cb,(0) +db(O)+ X b,(8) 
0 s+1 
in the interval 0<@<1, where we assume 
0<c,d<l, 
O0<s<t<n. 


The effect of the first assumption is to guarantee that the terms of index s and ¢ are present. The second 
is not restrictive since the proof is direct when s = ¢. 


We have faa ic k—n0 
P'(0) = = Vab(9) GH 
1 
= 716) [Sb,—Tb;], 
where S = s(1—0)+(1—c) (nO—s), 


T = t(1—0)—d(t—né) 
= t(1—d)+O6(nd—-t). 
30 Biom. 43 











466 Miscellanea 


The derivative P’(@) can vanish only if (i) 0 = e (ii) 9 = 1, or (iii) @ lies in the open interval J = (s/n,t/n). 

Consider first the case s = 0, = n. If c= d = 1, P(#)=1 and the case is trivial. Otherwise p must 
satisfy {p/(1—p)}*"* = (1—c)/d, which has a case solution since 6/(1— 6) is monotone and unbounded. 
A unique extremum exists. 


If not, both s = 0, t = n, it may be seen that S is positive for 9¢ J. From the second representation of 
T above it is easily seen that 7’, also, is positive. It follows that p satisfies the equation 


The left-hand side of this equation is monotone “apo in 6 since 


we d lo T ) . t—s 
ain a6 V8: 5) > -5 loses) =~ ao)" 
The existence and uniqueness of an extremum of P(@) in IJ has been proved. Further, at the extremum, 


b,[S’ T’ t- 
P"(p) =" al SF |< 





pqaLS TT pq 

ine max (5 7) : eS : 
since ———} = ---<—_, 
ea \S TT] pq pq 


where q= 1—~p, we conclude that the extremum is a maximum. 

In the open interval (0, 1) there is at most one local maximum. If a maximum should occur at 0 = 0 
there could be no other local maximum, otherwise the open interval between the maxima would have 
to contain a minimum. The same result would hold should a maximum occur at # = 1. 

In all cases there is a unique maximum of P((). 

All that remains is to prove uniqueness of the test subject to the conditions: 


(2) P(p)=1-a, 
(6) P(p)= 
(c) P"(p)<0. 


Omitting, for the moment, the condition (a) on the size of the test, we find from (b) and (c) that p is an 
increasing function of s* = s+ 1—c and of t* = t— 1+d, independently, since 
op op | _np-s 


as* dc Pp Pp)? -" 


Op Op _ t—mnp 
a* ad Pp) 

Given p (0<p <1), the condition (a) determines t* uniquely as an increasing function of s*. The value 
‘of @ which maximizes P(@) must then be definitely i increasing with s* and cannot assume the value p 
more than once. It follows that the test is unique for ail p in the open interval. 

The special cases p = 0, p = 1, are treated differently. It is sufficient to consider the case p = 0. For 
this case there is no unique size « test. We must have s* = a and ¢* may take any allowable value. 
Unbiasedness is assured by taking y,<1—a. However, the uniformly most powerful test among these 
is given by pf» = a, ~, = 0(k>0). A similar criterion exists for p = 1. 





CONFIDENCE ESTIMATES 


It is well known that the Neyman—Pearson method of testing may also be used as a means of estimation. 
The test described above leads to a confidence interval estimate of p at the confidence level 1—«a. It is 
only necessary to tabulate s* and é* against p and n. To obtain the desired confidence estimate we observe 
the number of successes v in 2 binomial trials and a value x of a random variable which has a rectangular 
distribution in the interval 0<a<1. We obtain the respective lower and upper confidence limits py 
and p, by locating in the table the values satisfying 


t*( po) =V—2a@, 8*() = V+. 


REFERENCE 


Tocuer, K. D. (1950). Extension of the Neyman-—Pearson theory of tests to discontinuous variates. 
Biometrika, 37, 130. 





me 


Ps 





jum, 


1ave 


is an 


alue 
ue p 


For 
alue. 
hese 


tion. 
It is 
erve 
ular 
S Po 


ates. 





Miscellanea 467 


A note on the circular multivariate distribution 


By G. 8. WATSON 
The Australian National University, Canberra, A.C.T. 


1. Summary. In Anderson’s (1941) paper on the distribution of serial correlation coefficients, Hotel- 
ling’s suggestion of a circular definition of these coefficients appears. Since then this device has been 
used constantly to lighten the mathematical difficulties in serial correlation theory. There has, however, 
been no suggestion that the nodel would ever be a good approximation in practice—the reverse has 
always been suggested in fact. It is the purpose of the present note to point out a fairly common class of 
data in which the circular model will be a good approximation. A typical case of this occurs when one 
wishes to study seasonal variation and begins by averaging the data over many years. 


2. Periodically averaged data. Let {x,} be generated by a stationary process with, for simplicity, zero 
mean and unit variance and a correlation function p, = E(x;x;), where s = |i—j|. 
Suppose Nn consecutive x’s are observed and define 


Ys = Ut Bjpnt-- + 2jyN-yn (J = 1,-.-yM). 
Then, with |i—j| =, 


var (y;) = N+2(N— 1) Pn t+ 2(N — 2) pant eee + 2pw-an> 
covar (Yis Y;) = Np,+(N— 1) (Pn—-s+Pn+s) +(N— 2) (Pon—s+ Pan+s) + oon + (P(N-1) n—-s + PA(N-1) nts) 


1 N-1 
and Pst (: = x) (Pn-s+Pn+s) +...+ (: nl “yr (PwW-1 ns + P(n-1) n+s) 
corr. (Y;,Y;) = 





1 2 N-1 
1+2(1 — x) oat 2(1 -5) Pont eee +2(1 - A) Peas 
When N +00 
PstPn-st Parst eee + PW-1 n-s + PIN-1) nts 


corr. (Y;, ¥;) >'—— Fin insted 
sida 1+ 2pn+ 2pont---+Pw-in 





If in this expression s is replaced by n—s, the expression is only altered by the addition of 
PNn—s— PiN-1) n+s to the numerator. This additional term will usually be negligible in practice. 

Thus, if N is sufficiently large, the variables y; (7 = 1, ...,”) will have zero means, constant variances, 
and the correlation of y; and y; will be the same for |i—j| = s (s = 1,...,n—1) as for |i-—j | =n—-s. 
This is the most general case of the circular multivariate distribution. 

Further, if the basic process is such that p, may be neglected for r>n, then 


corr. (Yi: Yj) = Ps+Pn-—s | t—j | = ia 
If, for example, the basic process is a first-order autoregressive process, with autocorrelation function 
Ps = p*, we find that 
corr. (y;,¥;) = p*+p"* 


if p*+p*-° 
= a were +p" 


since we are assuming that p’ (r>7n) is negligible. This last correlation function is that of the first-order 
circular autoregressive process considered by, among others, Anderson & Anderson (1950). 


REFERENCES 


ANDERSON, R. L. (1941). Distribution of the serial correlation coefficient. Ann. Math. Statist. 13, 1. 
AnpErson, R. L. & ANDERSON, T. W. (1950). Distribution of the serial correlation coefficient for 
residuals from a fitted Fourier series. Ann. Math. Statist. 21, 59. 











468 . Miscellanea 


The fitting of regression curves with autocorrelated data 


By N. A. HUTTLY 
Marconi’s Wireless Telegraph Company, Chelmsford 


1. INTRODUCTION 


This investigation was prompted by a study of problems dealing with measurements of reflected radar 
signals from a moving target, one such measurement being the difference in phase between signals 
received at two different aerials from the same target. In this particular problem the output from the 
radar was a continuous wave form, so that the data under investigation was also of a continuous nature. 

From physical considerations of this problem (e.g. the assumption that the target had no acceleration) 
it was expected that the phase difference would vary linearly with time. Thus the statistical investigation 
was concerned with the determination of this linear trend, that is, the problem became, in essence, that 
of fitting a regression line to continuously varying data. In order to deal with data of this form it is 
necessary to have recourse to the theory of autocorrelation, for all continuous data must be auto- 
correlated. 

This paper is a short investigation into the general principles involved in regression fitting when the 
data involved is autocorrelated. 


2. GENERAL DISCUSSION OF THE PROBLEM 


In the classical theory of regression for random normal variables the method of least squares and the 
method of maximum likelihood are equivalent, but for stochastically dependent variables this is no 
longer, in general, true. In the case of large samples it has been shown that the two methods are asymptotic- 
ally equivalent (Grenander, 1954; Grenander & Rosenblatt, 1954; Wold, 1950) for stationary stochastic 
processes. We shall, in this paper, be dealing with the problem of ‘small’ samples where we define a small 
sample as being a sample whose equivalent number of independent points is small* although the total 
size of the same may be large.f 

We shall only be concerned here with a simplified form of stochastic dependence, namely, that in 
which the parameters of the process are known a priori. This case very often occurs in problems con- 
cerning radio signals, the process being determined by virtue of the Wiener—Khintchine relation (Rice, 
1944) from the power spectrum and band widthof the receiver. We have confined ourselves to the cases of 
linear and quadratic regression, but the methods could easily be extended to higher-order regression 
curves. 

In §§ 3 and 4 we develop the general results appertaining to the least-squares and maximum-likelihood 
approaches, and in §5 we utilize these results for a special form of stochastic dependence, the linear 
Markov process. 


3. LEAST-SQUARES SOLUTION 
We may formulate as our model 


y(t) = a+ ft+ye+...+e(t), 


where y(t) is the observed variable; «, f, y, the constants of the regression curve, which are to be estimated 
from the data; ¢(¢) is a stationary random variable. 

Without loss of generality we take e(¢) to have zero mean, variance o? and autocorrelation function 
p(t). We also take, for convenience, the period of observation to be 


-T<t<+T. 


3-1. Linear regression 
Let us consider y(t) = a+ ft+e(t). (1) 
Since we sample in the range (— 7'<t< +T) we have 
? 
) Udt=0 (r odd). (2) 
-T 
* The sample may be discrete or continuous. 


t+ By equivalent number of points we mean that number of independent points which gives the 
same amount of information as the sample considered. 











In: 


Th 


F 


sil 





dar 
nals 
the 
ure. 
ion) 
jion 
hat 
it is 
ito- 


the 


the 
; no 
tic- 
stic 
nall 
otal 


t in 
on- 
ice, 
s of 
sion 


ood 
lear 


ited 


bion 


(1) 


(2) 


the 





Miscellanea 469 


In accordance with the general theory of least squares we form 


T 
S= i) {y(t)—a— poate, 
of 


and solve the ati —==>= 
be = —_ 
equations 2 


This procedure leads us to the two equations 


Fs, T 
a= ({ y(t) at) | en, (3) 
J —-T 


A T 
f=3 ( J tyt) a) / (278) (4) 
-T 


(by virtue of equation (2)). Since &, fare functions of y(t) and therefore of ¢(¢) they are random variables, 
therefore we can form their means and variances. From (1) we have 


E{y(t)} = a+ ft. 


Therefore EGj=a, S&P}=BP, 
T 
and var (@) = | | e(t) ai) / (47°), (5) 
-?T 
A T 2 
var (f) = oat [ tteyat / (47°), (6) 
—?T 
T ? 
From (5) 4T? var (@) = | | | e(t) ety atar|, 
-TJ-T 
and on putting T= -t, 
since E&{e(t) e(¢ +7)} = op(7T), 
~ T-t 
we have 4T? var (2) = a | dt J p(t) dr, 
-T J-T-t 


and we reverse the order of integration so that 
T T-t ar T-7 
| at ar>2{ ar | dt, 
-—T —T-t 0 -T 
a, 2T 
and finally, var (@) = o? ({ (27-1) pr) ar) [er (7) 
0 


A 27 
and var (f) = 30? ( (473 — 6T?7 +7) p(T) ar) [ar (8) 
0 


3-2. Quadratic regressior. 
Let us now consider y(t) = a+ ft+yt?+e(t). 


From the normal equations we get 


; " 7 y. 
a= a(ar i) y(t) dt— 5 y(t at) / (8T'), ” (9) 
-T -T 2 
. T ak 
p= 3( J w(t a) / 2T3, (10) 
—-7T . 
r yi 
F= 19(3 | y(t ar—* | y(t) at) [sr (11) 
-T -T 


leading to 6)=a, &p=P, &D)=Y, 











4.70 . Miscellanea 


e2T 
and var (2) = {30° | (487 — 24747 — 80781? + 60213 — 55) p(r) ar| / (647°), (12) 
0 
s 2T 
var (f) = {30° (47% — 6T?r +7) p(7) ar} [car (13) 
0 
2T 
var (7) = {4s [ (1675 — 4077 + 207273 — 375) p(7) ar\ | oar) (14) 
0 


The above analysis is true whatever the distribution of e(¢), but in order to effect a comparison with the 
maximum.-likelihood solution we shall assume that the é(t)’s are normally distributed. 


4. MAXIMUM-LIKELIHOOD SOLUTION 


The method that we have used in this section is to determine the maximum.-likelihood solution for the 
case of discrete variables, then let the distance between the variables become infinitesimal so that in the 
limit we have the continuous case. 

Therefore we use as our model 


Yr(t) = a+ Bl, + Ytp +... +6,(t). 


4-1. Linear regression 
We have the likelihood of obtaining a given sample of size n as 
L = P(e, ...€,) 


and if the é, follow a normal distribution, then 


n 
L = Kexp| —( >» onere) [20], 
r,s=1 


where K is a constant, w the variance-covariance determinant of the ¢,’s and w,, the cofactor of p,, in w. 
In order to simplify matters we make our sample of size 2n + 1 such that 


r=—n, —(n—1),...,—1, 0, 1, ..., (n—1), n. 
We get, therefore, by the theory of maximum likelihood, the equations 
@logL  dlogL _ 0 








15 
Jo ap (18) 
to solve. 
nr n 
Now since x 2X o,,f%2=0 if p+qis odd, 
r=—ns=—n 
we get as the solutions of (15) the estimates Z, 8 given by 
% = (y)/(1), (16) 
B = (ty)/(#’), (17) 
n n 
where x & o,,=(1), 


py >» rs yp ty = (; Pitt), 
y y 


r=—ns=—n 
N.B. Since ,, = w,, (from the symmetry of the variance-covariance determinant) the above expres- 
sions are symmetrical also, i.e. (t70’2) = (t’2t2) 


From (16) and (17) we have, using the definition of &(y,), 
6%) =a, &p)=f 
and var (@) = &{(e)*}/(1)’, 
var (2) = &{(te)*}/(w’)*. 








But 


anc 


Fu 


Th 


He 





(12) 


(13) 


(14) 


the 


the 
the 


16) 


S- 











— 


Miscellanea 471 


n n 2 
But we have @{(te)?} = | a onto) 
r= —-n 8=—-nN 
n n n n ) 
bie e >» p> p> Le Wpety Es @uoluCrt» 
r= —N 8= —-N U=—N V=—-N 
and since E(€,€y) = TP ous 


n n n n 
E{(te)} =o? D DV VD VD WygWuytptsPav- 


r=—Nn 8=—nNU=—NvV=—N 
Further, by the theory of determinants, 


LD WurPsv = . (o+u), 


wo (s=u). 
Therefore &{(te)?} = wo? s s pst pt, = wo(tt’). 
r=—ns=—n 
_ ow 
Hence we finally get vara = ()’ (18) 
var 8 = o*w/(tt’). (19) 
4-2. Quadratic regression 
We now consider y(t) = a+ Pt, + yt? +6,(t). 
The maximum-likelihood equations give us 
% = {(0*t’?) (y) — (@) (Py) }/{(@t?) (1) — (#)}, (20) 
B= (ty)/(tt’), (21) 
Y = (1) (By) — @) (WK?) (1) — (#93, (22) 
leading to 6a) =a, S&(P)=f, FY) =y¥ 
and var &@ = wo?(t?t’?) /{(ét’2) (1) — (é)*}, (23) 
var 8 = wo*/(tt’), (24) 
vary = wo*(1)/{(#t’*) (1) — (#)*}. (25) 


5. RESULTS 


In §§3 and 4, we have derived the formal expressions for the least-squares and maximum-likelihood 
solutions to the problem of fitting regression curves to autocorrelated data. 

By letting the summation expressions in equations (18), (19), (23), (24) and (25) tend to their integral 
limits we get a formal comparison between the two methods for continuous variables. In this section 
we shall examine the relative efficiencies of the two methods for the case of Markov dependence. This 
gives us 


p(t) = emit, 
and since m is only a scaling factor we can incorporate it into the sampling range 27' so that 
p(T) cod e-'r!, 


5-1. Linear regression 


Substituting into (7) and (8) we get for the least-squares solution 





var (&) = 0°(27'—1+e-T)/2T?, (26) 
var (A) = 30°4273 — 37? + 3— 3e-7(1 + T)}/2T°. (27) 
For the maximum.-likelihood solution we have 
1 p a 
big p 1 p aay ete = (1—p*)" 








472 Miscellanea 


and Wan = O_g-n = (1—p*)**-, 
Or = (1+p*)(1—p*)", 
rp = —p(1— pin, 
Ons =0 otherwise. 


We sample at equal intervals throughout the range 27’, therefore 


T 
ia — Oe 2k. —EEL...0 
n 


and p=er-Tn, 





Percentage efficiency 


—-—-— @ In linear regression 
tees @ In quadratic regression 

---- @ In linear/quadratic regression 

—— _ Y In quadratic regression 














60F 
! i 1! l oi i 
i) 2 4 6 : 10 12 16 
Fig. 1 
n 
This gives us (1) = Do,, = (1—p?)?*-1 {2(1 —p) + (1 —p?)? (2n— 1)}. 
~-n 
Therefore lim (1) = lim (27/n)?*(1+T7), 
n—>® n—>o 


and similarly for (tt’), (¢?), ete. So that upon substitution into (18) and (19) we get 


var &@ = o?/(T'+1), 
var B = 30°/T(T? + 3T +3). 





(28) 
(29) 


Hence by taking the ratios of (26)-(28) and (27)-(29) we can measure the efficiency of the least-squares 
estimators of the regression line as compared to the maximum.-likelihood estimators. Fig. 1 depicts 


this measure as a percentage. 








and § 





9) 


es 





Miscellanea 473 


5-2. Quadratic regression 


Substituting into (12), (13) and (14) we get the least-squares solution giving 




















var (&) = 30°{675 — 374 — 207? + 457% — 75 + 3e-27(T? + 5T + 5)*}/(87"), (30) 
var (A) = 30°{2T8 — 37? + 3 — 3e-27(T + 1)}/(27'), (31) 
var (7) = 450°{2T5 — 574 + 157? — 45 + 5e*T(T? + 37 + 3)}/(8T), (32) 
and substituting into (23), (24) and (25) we get the maximum-likelihood solution 
var (a) = 30°(37? + 157’ + 20)/{4(T? + 67? + 15T + 15)}, (33) 
var (2) = 302/{T(T? + 3T + 3)}, (34) 
var (7) = 450°(1+ 7)/{473(T? + 67? + 157 + 15)}. ¥. (35) 
2:0F 
1:5 
~a 
1:0 s 
0-5-- 
/ Maximum likelihood 
Wo ee ee Least squares 
j seeeerereee End-point 
fi 
| | i i 3 | 
0 2 4 6 ’ 10 12 14 16 
Fig. 2 


Once again, the ratios of (30)-(33), (31)-(34) and (32)-(35) give us the relative efficiency of the two 
methods. This ratio is also depicted as a percentage on Fig. 1. 

The time scale of Fig. 1 has been normalized with respect to the time scale of the parameters of the 
stochastic disturbance so that as plotted it is dimensionless. From Fig. 1 it is seenthat the efficiency of 
the least-squares estimators as compared to the maximum.-likelihood estimators falls to a minimum, 
at a time T' = 1-5 for the coefficients «, 8 of the linear regression and for f, y of the quadratic regression, 
whilst it falls to a minimum at 7' = 3-5 for the coefficient a of the quadratic regression. It is also noted 
that the least-squares estimator of the coefficient is more efficient in quadratic regression than in linear 
regression. The minimum efficiency for all cases is seen to be about 69 % for y, showing that the method 
used influences the accuracy of estimation of the higher-order coefficients more than it does the lower ones. 








474 Miscellanea 


As a corollary to this work we compare a third method of estimating the slope of a regression line. This 
estimator is simply obtained by taking the two end-points of the sample at hand and measuring the 
slope of the line joining them. We can show that for Markov dependence the variance of this estimator is 


o%(1 —e~27)/2T?. 


Now the variances of the least-squares and maximum-likelihood estimators are O(1/T%), so we 
normalize them by multiplying them by 27/30, so that what we plot in Fig. 2 for all the three estimators 
is 

A = 2T*(variance of estimator) /(30?). 


From Fig 2 we see that the maximum-likelihood estimator is always better than the others, but for 
small T (<4) the end-points estimator is better than the least squares. 


CONCLUSION 


As a result of this investigation we have seen how the least-squares estimators of regression coefficients 
compare with maximum-likelihood estimators when autocorrelation is present. For large samples the 
two are asymptotically equivalent, but when the sample is small (as defined in § 2) the efficiency of the 
least-squares procedure can vary from 69 % for the coefficient of the quadratic term y to 90 % for the 
estimate of the constant term a. So that if an efficiency of 70% can be tolerated the method of least 
squares is preferable for its facility of application. 


The author is indebted to Marconi’s Wireless Telegraph Co. Ltd, for permission to publish this paper 
and to his colleagues for many helpful discussions. 


REFERENCES 


GRENANDER, U. (1954). Ann. Math. Statist. 25, 252. 

GRENANDER, U. & RosEnBLatTT, M. (1954). Proc. Nat. Acad. Sci., Wash., 40, 812. 
Rice, S. O. (1944). Bell Syst. Tech. J. 23, 282. 

Wo tp, H. (1950). Trans. Int. Inst. Statist. Berne, 32, 277. 


Bounds for the variance of Kendall’s rank correlation statistic 


By ALAN STUART 


Research Techniques Unit, London School of Economics 


1. Daniels & Kendali (1947) established that the sampling variance of Kendall’s rank correlation 
statistic (called ¢ to distinguish it from the population parameter 7 = E(t)) obeys the inequality 


V(t) <2(1 —7*)/n = f,(7). (1) 
They also showed that a class of rankings (called canonical rankings) existed for which 
0°83 < V(t)/f,(7) <1. (2) 


As they pointed out, no great sharpening of the bound to V(t) can be expected in the canonical case. 

In general, however, f,(7) is a very poor upper bound for V(t), as Daniels (1950) has shown. In this 
paper, a new bound is proposed, which is sharper than /,(7) in rather more than three-quarters of all 
possible situations. The new bound is of the form f,(7, p,), where p, is Spearman’s rank correlation (grade 
correlation) for the population. The ratio /,(7,p,)/f,;(7) is studied, and it is found that for given 7, it 
decreases steadily as p, does, and when p, is as small as possible the ratio can be as low as 1/(n—1), 
i.e. the bound given by f, can be an order of magnitude in ” smaller than /,. When 7 and p, are equal, 
Ff, is about two-thirds of f,. 


2. We restrict; the discussion to samples from a continuous bivariate population. In this case, 
Hoeffding (1947) has shown that 


V(t) = 8{p(1—p) + 2(n— 2) (k—p*)}/{n(n—1)} (3) 








anc 


an 


If 





for 


ants 
the 
the 
the 
ast 


uper 


tion 








Miscellanea 475 


exactly, where p = $(1+7) is the probability of concordance of a pair of bivariate observations, and k is 
the probability that, among three bivariate observations, a specified pair is concordant with each of 
the others. 


We now define g as the probability that at least one of the three pairs of bivariate observations is con- 
cordant with each of the others, and d as the probability of complete concordance of the three pairs of 
observations. Then, using the formula for the probability of realization of at least, one among compatible 
events, we have g = 3k—2d. (4) 
Hoeffding (1948) has shown that 

g = (1+p,), (5) 
where p, is the population grade correlation, the analogue of Spearman’s rank correlation statistic r,. 
Using (4) and (5) in (3), we find 


Vi) = 





ee —72+ 3(n—2)[2(1+p,+ 4d) — 3(1+7)2}}. (6) 


An unbiased estimator of V(t) may be constructed from (6) which is simpler to calculate than the equi- 
vaient form based on (3) or that suggested by Daniels & Kendall (1947). For n at all large, it is the 


tediousness of estimating d in (6) or k in (3), each of which requires examination of all (3) triplets of 
observations in the sample, which leads us to seek a bound for V(¢) instead. 
3. Mann (1945) gives a simple proof that 
Prob {X, <X,<X;}< Prob {X, < X,} Prob {X,< Xj}. (7) 


Applied here, (7) gives at once 
d<p* = }(1+7), (8) 


and use of (8) in (6) gives us our new bound 


Vet) <5 ((2n— 1) (1-79) +402) (0,—1)} = Salto (9) 


( 

Since 7? = H(t?) — V(t) 

and (Hoeffding, 1948) 

E(t)=7, E(r,) = {87 +(n—2)p,}/(n+1), 


we find from (9) a bound for an unbiased estimator, Pit), of V(t), which is 


A 2 
V(t) < (3n — 1) (n—2) {(2n— 1) (1 —t?) + 4(n + 1) (7,—t)}. (9a) 


The inequality in (9) becomes a strict equality only when that in (8) does, i.e. when T = p = +1 or —1. 
In this degenerate case V(t) = f.(7,9,) = 0, from (9). 


4. The difference between the two bounds is 


2n—2 
filt)—filt»p,) = 2 (1-1 4(p,—1)}, (10) 


3n(n—1 


and is of order n-, just as the bounds themselves are. From (10) we deduce that 


Silt) >f2(T, Ps)» 
if and only if Ps<3(1+47-—7°). (11) 
If 7 = p, = +1 or = —1, the bounds are both zero, as is the variance. 


5. We now study the behaviour of the ratio of the bounds in the interval —1<7<1, namely, 


2n—-1 4(n—2)(p,—7) 


FAT. Ps) [ful7) = 3(n—1) 3(n —1)(1—7?)" 





(12) 














476 it Miscellanea 


The only part of (i2) w2 need consider is 
(Ps —T) 
1-7? ° 





F(t, Ps) = 


(13) 


We approach tho problem by considering p, as a function of 7, and asking what form this function must 
take to give F(T, p,) stationary values. Differentiating with respect to 7, we have 
(ps — 1) (1—7*) + 27(p,—7) 

(1-7)? , 





F’ (7, ps) = 


and this is zero when 








, . 279,-(1+7%) 
p= . 


p= TE (14) 


This may be solved as a differential equation in 7, and using the initial condition p,(1) = 1, we obtain as 
the unique solution 
py =T. (15) 
Substitution of (15) into (12) gives 
S(T, 7)/filt) = }(2n—1)/(n—1) (16) 
independent of the value of 7. From (12) and (16), it follows that f,(7, p,)/f;(7) — 4(2n — 1)/(n— 1) will be 


positive or negative with (p,—7). Thus (15) represents a ‘shoulder’ in the surface of F(7,p,). There is 
no turning point. 


6. For fixed 7, (12) is a monotone increasing function of p,. It is thus as small as possible when p, is 
@ minimum. Daniels (1950) and Durbin & Stuart (1951) have established the sharp bounds for p,: 


(37-1) <p,<}(1+27—7?) for aad 7 
4(7?7+27—1)<p,<}(37+1) for 7<0. (17) 
Substituting into (12) the lower bounds for p, in (17), we obtain 
_ (2n—1)7+3 
Sof7, (37 — }/A(7) = 3(n—1) (147) (1>7r>0), aii 
1 
Salts Mr? + 27 — Ihlfi(r) = — (-1<7<0). 


The ratios in (18) range from 4(m + 1)/(n—1) near tT = 1 to 1/(n—1) for T<0. It is remarkable that we 
have from (9) the bound 


Saft, $(7* + 27 — 1)} =e” (-—1<7r<0), (19) 


of order n-*. (This is not a new type of result; Hoeffding ((1948), p. 318) gave an example where 


"a= n(n—1) 


exactly.) Thus the bound (9) is an order of magnitude in n better than (1) in extreme cases. 


7. In the canonical ranking situations discussed by Daniels & Kendall (1947), it is easy to show that 
1—7\! ; 
pres o(~") (20) 


In (20) 72 0 applies to the inverse canonical ranking, while 7 < 0 applies to the direct canonical ranking. 
Substitution of (20) into (12) gives 








= 1+|7| | sis 4(n—2) (1—[4(1—7)}* 1 
fafrs1 2( ‘ | faine1+5e—o i ~4}. (21) 
The expression in braces on the right of (21) is 

ae —T)i 

THE AO go (-—l<r<]l), (22) 


so that (21) is > 1, and therefore f, always provides a better bound than f, in the canonical case. 











(13) 


nust 


(14) 


(17) 


(18) 


tL we 


(19) 


that 
(20) 


cing. 


(21) 


(22) 








Miscellanea 477 


8. Thediagram illustrates our results. The full lines delimit the possible combinations of the coefficients: 
f,is a better bound than /, for all points below the dashed line. The dotted line, representing the canonical 
situations, lics entirely above the dashed line. Integrations show that f, is a better bound than /, in 34, 
or 78-6 %, of the possibie situations. 























REFERENCES 


DanteEts, H. E. (1950). Rank correlation and population models. J. R. Statist. Soc. B, 12, 171. 

DanteEts, H. E. & Kenpatt, M. G. (1947). The significance of rank correlations when parental corre- 
lation exists. Biometrika, 34, 197. 

Dursin, J. & Stuart, A. (1951). Inversions and rank correlation coefficients. J. R. Statist. Soc. B, 
13, 303. 

HoerrpinG, WassILy (1947). On the distribution of the rank correlation coefficient 7 when the variates 
are not independent. Biometrika, 34, 183. 

HorErrpine, Wassity (1948). A class of statistics with asymptotically normal distribution. Ann. 
Math. Statist. 19, 293. 

Mann, Henry B. (1945). Non-parametric tests against trend. Econometrica, 13, 245. 











478 . Miscellanea 


A note on the theory of quick tests 


By D. R. COX 
Department of Biostatistics, School of Public Health, University of North Carolina 


1, There has been a good deal of interest recently in quick approximate methods of examining statis- 
tical significance. The object of this paper is to make some general comments on the interpretation and 
justification of such methods. An example of a quick test is the sign test for a normal mean; given a 
random sample of size n from a normal population of unknown mean p, the hypothesis that w = 0 is 
tested by referring the number of positive observations to the binomial distribution with parameter 
4 and index n. The efficient test, the ¢ test, is here so simple to work out that it is not very often that 
a quick test is required to replace it, the real value of quick tests being in more complicated situations, 
where the fully efficient tert may be difficult to apply. Thus in a problem where the efficient analysis calls 
for say the solution of a complex non-orthogonal set of least-squares equations, the use of an alternative 
‘inefficient’ procedure may be very profitable. However, it is convenient to use simple cases for 
illustration. 


2. We consider in this paper a very special situation. Let it be required to test the null hypothesis 
8 = 0, concerning the single unknown parameter @, and let the theoretical set-up be sufficiently simple 
for the best significance test, based on a statistic ¢ say, to be uniquely determined and let the proposed 
quick test be based on the statistic, g. Assume that ¢ and qg are both unbiased estimates of # and deal 
with the large-sample approximation in which ¢ and q follow a bivariate normal frequency distribution 
with a variance-covariance matrix which can be treated as independent of 0. If the variances of t and ¢ 
are 0? and o?/E, Fisher (1925) showed that the correlation coefficient between ¢ and q is ,/Z, where EZ is 
the efficiency of q relative to t. 

The general remarks below apply quite generally but the quantitative analysis is restricted to the 
situations just described. 


3. The customary method of comparing the quick and the efficient tests is by power curves. That is, 
we plot against @ the probability of attaining significance at say the a per cent level, first with the 
efficient test and then with the quick test. The drop in power from the efficient to the quick test is then 
an indication of the effect on the long-run behaviour of replacing the efficient test by the quick test. 
In the present case the power curves are expressed in terms of the normal probability integral. 


4, The power curves define the long-run behaviour of the two tests considered individually, but 
give no direct indication of the extent to which the answers obtained by applying the tests to the same 
data tend to agree. This last, however, seems of interest. Some knowledge of it gives us a clearer picture 
of the comparative behaviour of the two tests in applications, and also helps to deal with the following 
situation that can arise in using quick tests. Suppose that we have made a quick analysis of some data. 
If the full analys‘s would take an appreciable time we may well ask: how likely is it that we should get 
essentially the same answer from the effic:.nt test as we have already obtained from the quick test? 
If we are convinced that the efficient analysis, if made, would only reproduce the conclusions of the quick 
analysis, we may rest content with the quick analysis. If, however, there is a possibility of materially 
different conclusions, i.e. of a greatly changed level of significance, we would often decide to go ahead 
with the full analysis. This would be especially so in scientific work where the data are to be published 
and may be reanalysed by future workers. 


5. The considerations of § 4 suggest that we should examine the regression of ¢ on q, i.e. that we should 
consider the problem of predicting the efficient statistic ¢ given the value of the quick statistic, g. An 
alternative formulation would be to ask how likely it is, given q, that t exceeds its « per cent critical value 
for testing the null hypothesis. This approach seems, however, rather less helpful since we are usually 
interested to some extent in the actual level of significance attained by the data and not just with 
whether some preselected level is exceeded. 


6. For given 0, q it follows from the properties of the bivariate normal surface that t/o is normally 
distributed with mean 6(1—£)/0+qE/o and variance 1— E. This involves the unknown parameter 0 
and while we could, for some purposes, consider the whole set of joint distributions as a function of the 
unknown parameter 0, it is an advantage to introduce an averaging operation over 0. Therefore we assume 
Bayes’s hypothesis that the prior distribution of @ is uniform; then, since, for given 0, q is normally 
distributed around 8 with variance o?/E, 0, for given q, is normally distributed around g with variance 
o?/E. Hence t/o is normally distributed with mean q/o and variance (1—)/E. Or equivalently, if the 





al 





lly 








Miscellanea 479 


normal deviate based on q for testing the null hypothesis 0 = 0, is q’ = (q-—-9,)/E/o, the corresponding 
normal deviate, t’ = (¢—@ )/o has, given q’, mean q’/,/E and variance (1 — E)/E. 


7. The distribution of ¢’ given g’ just derived depends for its frequency interpretation on the truth of 
Bayes’s hypothesis as a statement about the frequencies of values of 0 in a long series of applications of 
the test. It is not usual to base a theory of statistical inference on Bayes’s hypothesis without abandoning 
the frequency interpretation of probability, but in the present paper we are merely exploring in a general 
way the properties of quick tests. It is perfectly reasonable to ask how the quick test would behave if 
6 were distributed in some special way, and if a special form is to be chosen the uniform distribution 
seems a natural one. 

The distribution of ¢’ given q’ in the range of most practical interest depends on the form of the prior 
distribution in the interval, say | 0— 4, |< 60; in large samples o will be small and so this will represent 
a narrow interval for 0 over which the prior probability is, in some types of application, likely to vary 
little. If a prior distribution for @ exists and is not constant over this range the results will be affected. 
For example, if the prior probability density is higher at the centre of the interval, the value of | 2’ | 
given q’ will be reduced. 


8. A few numerical results derived from the formulae of § 6 are given in Table 1. Some examples will 
now be given to illustrate this table. 


Table 1. Relation between standardized deviates t’, q’ when 0 has a special prior distribution 





E Mean of ¢’ Standard error 
given q’ of t’ given q’ 
0:99 1-005 0-101 
0-95 1-026 0-229 
0-90 1-054 0:333 
0-80 1-118 0-500 
0:70 1-195 0:655 
0-60 1-291 0-816 
0-50 1-414 1-000 
0-40 1-581 1-225 
0-30 1-826 1-528 
0-20 2-236 2-000 
0-10 3-162 3-000 

















Example 1. Suppose that a test with Z = 0-99 gives a value just significant at 5%, i.e. g’ = 1-645. 
Then ¢’ has mean 1-653 and standard error 0-101. Therefore ¢’ probably lies between 


1-653 + 0-101 = (1-552, 1-754) 
and fairly certainly lies between 
1-653 + 1-96 x 0-101 = (1-455, 1-851). 


The significance levels corresponding to the two sets of limits are approximately (0-060, 0-040) and 
(0-073, 0-032). 

As would be expected with so high an efficiency, serious disagreement between the results of the tests 
is unlikely. 

Example 2. Suppose that a test with H = 0-80 gives satisfactory agreement with the null hypothesis, 
say q’ = }. Then ¢’ has mean 0-559 and standard error }. The two sets of limits for ¢’ are therefore (0-059, 
1-059) and (—0-421, 1-539). Therefore it is unlikely that the level of significance with ¢’ is higher than 
about 6%. 

For many purposes the efficiency would be considered quite high, and yet appreciable disagreement 
between the two tests will arise quite often. 

Example 3. Consider a very inefficient test with Z = 0-2 and suppose that it gives exact agreement 
with the null hypothesis, i.e. g’ = 0. Then since the standard error of ¢’ given q’ is 2, there is an appreciable 
chance that ¢t’ would have given significance at a high level. If, on the other hand, q’ = 2, indicating 
significance at the 24 % level, ¢’ has mean 4-472 and standard error 2, so that it is rather unlikely that 
v fails to indicate high significance. 











480 Miscellanea 


This example expresses quantitatively the obvious point that failure to get significance with a very 
inefficient test gives no information about whether significance would be obtained with an efficient test, 
but that if the quick test gives significance it is likely that the efficient one will too. 

It must again be stressed that these are illustrations of the relation between ¢’ and q’ for a special 
prior distribution of 6. 


9. The estimation formulae analogous to the above results for significance tests are as follows. Let 
q be an estimate of 0 with standard error 7, and efficiency FZ, and let ¢ be the efficient estimate from the 
same sample. Then if @ has a uniform prior distribution, (t—q) has mean 0 and standard error 0, ,/(1—£), 
thus enabling us to assess from the quick estimate whether our conclusions are likely to be appreciably 
modified by computing the efficient estimate. This will not be discussed further here. 


10. Two general points need to be made about the above calculations. First, it is in practice sometimes 
possible to estimate approximately the probable direction and nature of the difference between ¢ and q 
by careful inspection of the observations, the main difference between ¢ and q often being in the weight 
they attach to extreme observations. Secondly, all the above work is based on what happens when the 
observations follow the distribution law assumed in deriving the tests; the relative robustness of the 
two tests, which is relevant in choosing between them, has not been considered. 


11. The general purpose of these results has been discussed in § 4. A direct application of the formulae 
to the analysis of data would be as follows: First apply the quick test; then by the method of §7 work 
out the probable limits for t/—to do this the efficiency E of the quick method needs to be known. If it 
seems possible that t’ may give a seriously different conclusion from q’, ¢’ itself is calculated and the 
significance level given by it used. Of course the selection of that test giving the more significant answer 
is wrong. This procedure needs further investigation before it can be generally recommended. 

Example 4. In a test of two methods A, B of determining by sampling the total number of bacteria 
ona plate, twenty plates were counted by both methods. The differences between the counts by A and B 
were 3, — 3, 3, 6, —8, —1, 6, 8, 5, —8, 11, —1, 5, 6, 5, —6, 3, 7, 14, 8. Is there evidence of bias? 

There are 14 positive observations out of 20 so that, using the sign test, g’ = (14—10) ,/4/,/20 = 1-79, 
which is significant at 10 % in a two-sided test. (If we correct for continuity the result is less significant.) 
Since for this test H = 2/m (Cochran, 1937), t’ has mean 1-79,/7/,/2 and standard error ,/[(7—2)/m], 
i.e. is 2-24+ 0-60, so that ¢’, which in this case is Student’s ¢, could, consistently with g’, take values 
representing widely differing significance levels. Therefore it is worth working out ¢’, and its value comes 
to 2-35 (0-05 > P> 0-02), in good agreement with the value predicted from q’. Notice that there are no 
very large deviations; if say the second observation had been — 18, instead of — 3, q’ would have been 
unaltered. However, we would have noted this large negative deviation and would not have expected 
t’ to give a more significant answer than the quick test. 


Summary. It is suggested that in comparing an inefficient quick test with an efficient test we should 
consider the extent to which the results of applying the two tests to the same data tend to agree, in 
addition to the long-run behaviour of the tests considered separately. A very simple case is considered 
quantitatively. 


I am grateful to Prof. E. 8. Pearson for very helpful criticism. 


REFERENCES 


Cocuran, W. G. (1937). The efficiencies of the binomial series tests of significance of a mean and of 
a correlation coefficient. J. R. Statist. Soc. 100, 69-73. 
Fisuer, R. A. (1925). Theory of statistical estimation. Proc. Camb. Phil. Soc. 22, 700-25. 


A note on the signs of gross correlation coefficients and partial correlation coefficients 


By OLAV REIERSOL 
University of Oslo, Norway 


We shall use the following matrix theorem: 


THEOREM 1. If the principal minor determinants of a square matrix A and the determinant value of A 
itself are all positive, and if the non-diagonal elements of A are all negative, then all elements of the adjoint 
of A are positive. 








———— ee 





| of 








Miscellanea 481 


This theorem was proved in a slightly different form by Mosak (1944, pp. 49-51). The theorem was 
stated in its present form by Metzler (1950, p. 340). 

Remark. Theorem | still remains true if ‘adjoint’ is replaced by ‘inverse’. 

It is evident from the proof of Metzler that Theorem 1 may be sharpened to 

THEOREM 1’. If the principal minor determinants of a square matrix A are all non-negative, and if the 
non-diagonal elements of A are all negative, then all non-diagonal elements of the adjoint of A are positive. 

Let 

1 


| ae 
R=-|" 2 Eo” Oe 
fas fe os 4S 


be the correlation matrix of a set of variables x,, x», ...,”,. The partial correlation coefficients of highest 
order of the set are given by 
cof. (7;;) 


Py = ~ Aifeof. (744) cof. (74;)}" * 





where cof. (7;;) is the cofactor of 7;; in the matrix R. We see that the sign of a partial correlation coefficient 
is the opposite of the sign of the corresponding element of the adjoint of R. Applying Theorem 1’ we 
thus get: 


THEOREM 2. If all gross correlation coefficients of a set of variables are negative, all partial correlation 
coefficients of all orders of the set are also negative. 

We shall next consider the case when all partial correlation coefficients of highest order are positive. 
Then all non-diagonal elements of the adjoint of R are negative. 

Suppose first that F is singular. Then the rows of the adjoint of R are proportional. Since the diagonal 
elements of the adjoint of R are all positive, the non-diagonal elements cannot all be negative when 
n>3. We conclude that F# is non-singular when all partial correlation coefficients of highest order are 
positive. The inverse matrix Q = R-! thus exists, and its non-diagonal elements are all negative. Applying 
Theorem 1 we conclude that the elements of R = Q-' are all positive. 

Let R and Q be partitioned symmetrically into submatrices in the same way 


Hh, & 2, @ 
R= 11 2%] and Q=| *" sf ; 
Ry Rag Qe Ve 


Ri = Qu ~_ Q12Q22) Qe1- 


Using the information we have about the signs of the submatrices of Q and applying Theorem 1 to the 
matrix Q,., we conclude that the non-diagonal elements of Rj are all negative. Now any partial correla- 
tion coefficient of any order of the set x1, 2, ...,2, has opposite sign of a non-diagonal element of some 
Rit of the original correlation matrix or of a matrix which we obtain from R by an interchange of rows 
and the corresponding interchange of columns, hence we obtain: 


Then (Hotelling, 1943, p. 4) 


TxHEoreEM 8. Jf all partial correlation coefficients of highest order of a set of variables are positive, all 
partial correlation coefficients of lower orders are also positive, and all gross correlation coefficients are positive. 

If we change the signs of all elements of one or more of the rows of a square matrix A and afterwards 
change the signs of all elements of the corresponding columns, we have performed a particular kind of 
cogredient (also called congruent) transformation of the matrix A. Moreover, this is the only cogredient 
transformation whose only effect is to change signs. We shall therefore call it a cogredient change of signs 
of the matrix A. We shall adopt the convention that no change of signs is also a cogredient change of 
signs. 

A cogredient change of signs of the matrix R will also be called a cogredient change of signs of the 
gross correlation coefficients of the set 2,,...,%,. If P is the matrix whose elements are defined by (1), 
a cogredient change of signs of P will also be called a cogredient change of signs of the partial correlation 
coefficients of highest order of the set of variables x, ...,%p- 

The following generalization of Theorems 2 and 3 is easily proved by performing a change of signs of 
a subset of the set of variables 2, ...,%: 


THrorEM 4. If the signs of all gross correlation coefficients of a set of variables may be made negative by 
a cogredient change of signs, or if the signs of all partial correlation coefficients of highest order of the set may 
be made positive by a cogredient change of signs, then, without any change of signs, any partial correlation 
coefficient of any order has the same sign as the corresponding gross correlation coefficient. 


31 Biom. 43 











482 Miscellanea 


REFERENCES 


Hore.uine, H. (1943). Some new methods in matrix calculation. Ann. Math. Statist. 14, 1-34. 

MetTzter, L. A. (1950). A multiple-region theory of income and trade. Econometrica, 18, 329-54. 

Mosak, J. L. (1944). General Equilibrium Theory of International Trade. Cowles Commission Mono- 
graph no. 7, Bloomington, Indiana. ; 


The estimation of the mean of a censored normal distribution by ordered variables 


By P. G. MOORE 
University College, London 


1. In many practical problems the final data obtained are incomplete in that time or equipment has 
not been available for completion of the originally planned experiment or that something has gone 
wrong, leaving some observations missing. We suppose that the observations form a random sample 
from a normal population, but that the sample is in some way censored, meaning that all observations 
above some limit, or below another limit or outside both limits, are unavailable to us for the purpose of 
estimating the mean. The only information available to us about the missing observations is their 
total number. 


2. One method of approach is to consider the maximum-likelihood solution. Details of this with 


appropriate auxiliary tables have been given by Hald (1949) and Gupta (1952). In the case of small 
samples the two solutions differ. Hald supposes that of the n observations in the sample only the r 
values of the variable, x, whic!: fail on one side of a fixed truncation point are available; r will thus vary 
binomially from sample to sample. Gupta supposes that if the observations arranged in order of magni- 
tude are 2, %,...,2,, then the values of x,...2, are known but not the values of a,,,...%,. Here r is 
fixed. The asymptotic variance of both estimators is, however, the same, and it is this variance which is 
used below in comparing the efficiency of the maximum.-likelihood estimator with alternative estimators. 

The purpose of this note is to point out that by using methods of estimation of the mean suggested 
in an editorial paper of Karl Pearson (1920) we can often get results that are nearly as good as the 
maximum-likelihood results at a great saving of time and calculation. Pearson’s technique depends on 
the fact that even though a sample has been censored many of the ordered variables are available to us 
and the problem is how best to use those values. 


3. For computations use will be made of the expansions given by David & Johnson (1954) for ordered 
variables in a random sample of individuals drawn from a population whose probability density 
function is f(x) with mean £ and standard deviation unity. We define 


x 
F(X) = f(x) da, (1) 


and arrange the sample observations in order of magnitude to give values 2,2 ,...,%,. Defining X, by 
the relation 


F(X,) = <i (r = 1,2,...,0) 


and expanding 2, about X, in an inverse Taylor series gives 


t= X, + Xp[h(x,)] + Xz [h(x,) + sees (2) 
where h(x,) = F(x,)—F(X,) and X/= eS and so on. 


The utilization of such expansions enables various approximations to the moments of ordered variables 
to be obtained, and David & Johnson give tables of them in terms of powers of (n + 2)-!. The accuracy 
of these expansions was examined for a few cases of sampling from a normal population. For example, 
using (2), the variance of the median is 


1-570796 2-467401 3-462732 


i aa n+2 : (n+2)? (n +2)8 ’ 


(3) 











2) 


le, 





Miscellanea 483 


and numerical values derived from this formula are compared below with exact values for low sample 
sizes given by Hoi (1931): 


Median in sample of n n=5 n=7 n=9 n=11 
Exact variance 0-286834 0-210446 0-166093 0-137227 
Approximate variance 0-284849 0-209744 0-165793 0-137164 


For samples up to size 10 the variances and covariances of all ordered variables have been given by 
Godwin (1949). As an illustration, we compare two cases for samples of n = 10 using terms in (2) up to 
(n+ 2)-%. From these examples it seems that the expansions would give sufficient accuracy for practical 
purposes provided » was greater than about 10. 


var (23) covar (X33) 
Exact value 0-17500 0-06302 
Approximate value 0-17485 0-06297 


4. If we use a pair of the ordered variates to estimate the mean, t))2re are two cases to be considered, 
as we may either use two symmetrically placed variates, that is, x, and x,_,,,, or we may alternatively 
use an unsymmetrical pair. Fora one-sided censored form of distribution more observations are available 
to us on one side of the central value than on the other, and to make use of a symmetrical pair means 
that a number of observations may appear to be needlessly wasted. Considering the symmetrical case 
first we find from the expressions in (2) that for large samples the estimate 


V, = 3(%p + Un 41-1) (4) 
X?p 
has as variance the approximate value variance (V.,) = are . (5) 
n 
where p, = F(X,) =r/(n+1). ' 


We find, on substitution, that the asymptotic efficiency of V, as compared with that of the mean of the 
eomplete sample is 22?7/p,, where z, is the ordinate of the standardized normal distribution corresponding 
to a probability integral of p,. This function is tabulated in Table 1 together with the asymptotic maxi- 
mum-likelihood efficiencies from Gupta’s work based on the assumption that the distribution is censored 
at one end only. From the table we see that the efficiency of V, increases with p, to a maximum at about 
p, equal to 0-27 and then decreases to the value 0-637 which occurs when just the median is used for the 
estimation. Pearson showed that the maximum efficiency occurs when p, is equal to 0-271, and the table 
shows that if the censoring at either end is greater than 27-1 % we should take the symmetrical pair 
that lies farthest out from the centre of the distribution, whilst if it is less we should take the pair that 
makes r/(n + 1) as near 0-27 as possible. 


Table 1. Efficiencies of estimates using a pair of quantiles 


























{ | 

| ’  |Maximum- | Maximum- Maximum- 

| Dr Efficiency | likelihood Pr Efficiency | likelihood Pr Efficiency | likelihood 

| of V, efficiency of V, | efficiency of V, | efficiency 
0-10 0-616 0-980 0-24 0-805 0-920 0-38 0-763 0-806 
0-12 0-667 0-974 0-26 0-809 0-907 0-40 0-746 0-784 
0-14 0-708 0-967 0-28 0-809 0-893 0-42 0-728 0-763 
0-16 0-740 0-959 0-30 0-806 0-878 0-44 0-707 0-738 
0-18 0-765 0-950 0-32 0-799 0-861 0-46 0-685 0-714 
0-20 0-784 | 0-941 0-34 0-790 0-845 0-48 0-661 0-685 

| 0-22 0-797 | 0-930 0-36 0-778 | 0-826 0-50 0-637 0-659 








Next we consider whether it is possible to improve the estimates by using an unsymmetrical pair of 
quantiles. In this case we have to use a weighted average of the pair. Let the two variates be x, and 2, 


and consider e \ r 
2, 0(—— i) -2,0(— i) 
f= = v, (say), (6) 


a 

















484 Miscellanea 


where (V) is the deviate of the standardized normal curve corresponding to a cumulative probability 
of V. Then if the unknown underlying normal distribution has mean £ and standard deviation o, we have 


é oD as cy) A ae t—§ =@ A 
o n+1 o n+1 
and hence on substitution in (6) and reduction, we have 


é(&) = &. 


Further, as an approximation to the variance, we find 











(n+ 2) var(V¥,) ot PP (1 — aPC Pe +201 gy Pell —Ps) (7) 


r 8 rs 


x= 0(c5) ole) -*let)} 


Using (6) and (7), the optimum values of p, have been found for four different values of p, together with 
the new efficiencies. Althoug \ there is some gain, the gain is not in general very large. 





Table 2. Efficiencies of V, 

















| 
Pr | 0:30 0-35 0-40 0-45 
he | 
Symmetrical Ds | 0-70 0-65 0-60 0-55 
Efficiency | 0-806 0-784 0-746 0-696 
Unsymmetrical Ds | 0-74 0-73 0-72 0-67 
Efficiency | 0-808 0-794 0-764 0-716 
| 














5. To improve the efficiency of the estimate four ordered variates placed in two pairs could be used. 
Thus the estimator 
Vg = $0(%p +p ya--) + H1—&) (Up + %n 41-5) 


would have the mean of the sampled population as its expected value. The efficiency would vary with 
the value of « that is taken and also with the values of 7 and s chosen. We will assume that 0-5> p,> p,. 
Table 3 is a modified form of Pearson’s table and shows, for large samples, the optimum values of p, 


Table 3. Efficiencies of V, 








Pr 0-1 0-2 0-25 
Ps 0-35 0-42 0-42 
a 0-374 0-542 0-615 
Efficiency 0-912 0-897 0-872 
Maximum-likelihood efficiency 0-980 0-941 0-913 




















Table 4. Efficiencies of V, with p, = 0-5 


























| 
P 0-1 0-2 | 08 0-4 
a 0-51 0-38 0-25 0-15 
Efficiency 0-866 0-878 0-834 0-750 
Maximum-likelihood efficiency 0-980 0-941 0-878 0-784 





and | 
maxi 
equa 
aren 
likeli 





ve 


th 


ith 








Miscellanea 485 


and @ for various values of p, together with the efficiency thereby achieved and the corresponding 
maximum_-likelihood efficiency when a portion p, is censored. In Table 4 we take the case where 7, is 
equal to 0-5, i.e. we use the median and two symmetrically placed variates. The optimum efficiencies 


are now somewhat greater than when just two variates were used and are approaching the maximum- 
likelihood values. 


6. The estimates could be improved by using more variates but would then become rather unwieldy. 
The results obtained would seem to show that if there is a fair amount of censoring, say over 20 % of 
the distribution, then the simple estimates based on ordered variables may suffer very little in efficiency 
when compared with a maximum-likelihood estimate of the mean. On the other hand, when there is 
very little censoring, say under 10%, there is a considerable gain in using the maximum-likelihood 
methods of estimation. n a situation which involves censoring at both ends of the distribution, the 
oytered variates would be even better vis-a-vis the one-ended maximum-likelihood estimate, but such 


4ases would occur rarely in practice. 


Later note. The recent publication of a paper by D. Teichroew (1956, Ann. Math. Statist. 27, 410) 
enables the exact efficiency of V, to be calculated and compared with the approximate probabilities 
obtained as in Table I for the case of a sample of size 20. 





Efficiency of V, | 





| | 
| | Exact | Approximate | 
| | 
J | 
| 4 | 01905 | 0-780 0-775 
6 | 02381 | 0816 | 0-805 | 
6 | 02857 | 0824 | 0808 | 
7 | 03333 | 0-810 0-796 | 
| 
| 


8 0-3810 | 0-779 | 0-763 





Thus the approximate formula slightly underestimates the efficiencies for a sample of size 20 and V, 
is in fact slightly better, as compared with the maximum-likelihood estimator, than it appeared to 
be from Table 1. 


REFERENCES 


Davin, F. N. & Jonnson, N. L. (1954). Biometrika, 41, 228. 
Gopwin, H. J. (1949). Ann. Math. Statist. 20, 279. 

Gupta, A. K. (1952). Biometrika, 39, 260. 

Hatp, A. (1949). Skand. AktuarTidskr. 33, 119. 

Hoso, T. (1931). Biometrika, 23, 315. 

Pearson, K. (1920). Biometrika, 13, 113. 


A note on Wilcoxon’s and allied tests 


By F. N. DAVID 
University College, London 


1. It is assumed that there are available two samples n, and n, randomly and independently drawn 
from each of two populations supposedly identical under the null hypothesis. The n, +, joint sample 
values are ranked in ascending order of magnitude thus forming a random sequence of two alternatives 
in which order must be taken into account. Mann & Whitney’s adaptation of Wilcoxon’s test consists, 
in essence, of taking the sum of the ranks of one of the alternatives. This criterion is used to test for the 
equivalence of the parameters of location of the two populations. Rosenbaum (1953) and Kamat (1956) 
have also discussed criteria based on the random sequence which may be used, if desired, to investigate 


31-3 











486 Miscellanea 


the equivalence of the parameters of dispersion of the two populations. It is the purpose of this present 
note to suggest a method of approach to tests on the random sequence whereby symmetric functions 
of the ranked observations may be used. Since such an approach is based on approximating to the 
distribution of a criterion by means of its moments, it will only be useful for sequences of moderate 
length and upwards. 


2. The random sequence of length n, +7, = N, in which we take account of order, can be considered 
as a finite population of the first N integers. The ranks of the first sample will be n, elements randomly 
drawn from this finite population without replacement. The moments of any symmetric function of the 
first sample can therefore be written down immediately either from Isserlis’s results (1931) or by using 
the procedure devised by Irwin & Kendall (1944) or from Wishart’s tables (1952). Such moments will 
be in terms of the K-statistics of the first N integers which can themselves be obtained from the moments 
of the first N integers. Thus Roee wR: 





N+1 N(N +1) NAN +1)? 
K, = ——, = ———— , KkK.2-0-—— —y, 
: 2 ’ 12 . 120 

_ _NXN+1)) NXN+1)2 
r 252 504” 
NN +1)? 
K, = —» A+" any +1)?—4N(N +1) +2}, 


720 
and so on. 


3. The mean of the ranks of one sample. Abdel-Aty (1954) has shown that if we write 
M(1") = én(k,— Ky)’, 





a 
5 Sa a ren ae 
= 9s 
~n N’ 


where N is the number in the finite population and n that in the sample, then 
M(1)=0, M(1*)=K,A;, M(1*)=K,A;, M(1*) = K,Ay+3Ko_A3. 


Let 21,%2,...,%,,, be the ranks of the individuals of the sample of n,, and 7, the mean of these ranks. 
Then we have 


= N+1 i n(N +1) ad 
fil) = 2 "< fo(X4) = “TB, ’ fz(%) = 0, 
< (N+1)n.[ Nn, 2N 
Kz) =- a Ot 
%) 120n¢ nt . ny 


The distribution of %, will tend to normality as N increases provided n,/N remains fixed. To test therefore 
for equivalence of the population parameters of location with the alternative hypothesis that they may 
be different we would take the test criterion 
5 N+ 1\ 
- 


n(N +1) 7) 
(" 12n, ) 


and refer it to normal tables. This is Mann & Whitney’s procedure for N 7e.sonably large and neither 
N, nor Nz small. 





4. The variance of a sample of n, individuals drawn from a finite population of N we define as 
bic 
= a —7Z,)?. 

“aot £% %,) 

The moments of k, have already been given in the form of multiple subscript K’s by Wishart. Thus 


M(2) = 0, 





M(2?) = Kaye Kv 





+ (Ky,— K), 


my =%m4-l 














re 


er 





Miscellanea 487 


with longer expressions for M(2%) and M(2*) which we do not reproduce here. Irwin & Kendall give the 
second moment as 
n,(Nn,-—N—n,-1) 2n 
M(2?) = &y(k,—K;)? = *~—- ee Ke + ——_> __ 3, 
OO = ex a = ram, —1) NN +H) (=I) 
To test whether there is a difference in dispersion one criterion could be to take the ratio k,/K, or, since 
K, is a constant, just k,. The momental constants of the ratio for sequences of 20 and 30 are given in 
Table 1. The standardized percentage points are obtained by interpolation from Table 42 of Pearson & 
Hartley (1954). Forn, and n, very different it is clear that an assumption of normality for the distribution 
of a sum of squares will be inadequate unless the sequence is of length at least greater than 30. 


Table 1. Momental constants of k,/K, for various values of N and n, 


























| 24% points 
| 
N | m | M(2)/K3 |(M(2)/K34) VA. |B, Bs 
Upper | Lower 
| | | 
aa i ee z inp ‘Pet 
20 10 | 0-0522 | 0-2284 0-023 0-0005+ 2-746 | 1:95 | —1-93 
p 0-0334 0-1828 —0-047 | 0-002 2749 | 192 | —1-96 
| 14, 0:0209 | -0-1445 —0-122 | 0-015- 2-730 | 187 | —2-00 
16 00119 | @-1092 —0-219 | 0-048 2-674 | 1-79 | —2-04 
as re ——— casio it bce i) 
30015 0-0319 | 0-1786 0-008 0-000 2-843 1-95 — 1-94 
18 0-0207 | 01439 — 0-045 0-002 2-842 1-92 —1-97 
21 | 0-0131 0-1142 —0-103 0-011 2-828 | 1-89 — 2-00 
24 | 0-0075+ 0-0866 —0-180 | 0-032 2-788 1-885 — 2-03 
| 








Mood (1954) suggests as a rank criterion to test dispersion, the sum of squares of the deviations of 
one sample from the population mean, $(N + 1). This is 


nm 

2% (@)— Ky)? = (m— I) hy + (hy — Ky), 

i= 
the moments of which can be written down immediately from Wishart’s tables. Which criterion is the 
more sensitive to changes in the dispersions of the two populations generating the samples it does not 
appear as yet possible to say. It is also difficult to see whether a criterion based on the sums of squares 
of both samples, e.g. 

— kq(1) + ky(2) 


will be an improvement on a criterion based on a single sample, but probably little sensitivity will be 
gained if this is tried. Whatever criterion is used it will be possible from Wishart’s tables to write down 
its moments either exactly or to any order of approximation desired and to approximate to its probability 
distribution (for sequences of reasonable length) by means of a smooth curve. 

Tests based on symmetric functions of all the ranks of a single sample should be more discriminating 
than tests based on ordered variables, such as range for example. One point should, however, be men- 
tioned about all ranked tests for dispersion differences. It may be that ut the same time as there is a 
change in the dispersion parameters of the populations generating the sample, there is also a change in 
the location parameters which will effectively mask it. It does not seem possible to devise a single criterion 
based on ranks which would take this into account. 


5. Correction for ties. Corrections for ties have been discussed recently by Putter (1955) who gives 
a list of references. Provided the sequence is long enough and the ties not too frequent a straightforward 
and easy allowance can be made. Suppose we have v elements which tie at rank 7. The usual method of 
assigning ranks is to give each of the v elements the same rank, 7 + }(v— 1). K, is then unaltered but all 
the other K-statistics of the population are changed. We can calculate the K-statistics of this changed 
population and use these values in the moments of whatever criterion we are interested in. For n, and 
Nn, large and for the number of sets of ties not too small the effect appears negligible. 











488 Miscellanea 


REFERENCES 


AsvEL-Ary, S. H. (1954). Biometrika, 41, 253. 

Irwin, J. O. & Kenpatt, M. G. (1944). Ann. Eugen., Lond., 12, 138. 

Isseruis, L. (1931). Proc. Roy. Soc. A, 132, 586. 

Kamat, A. R. (1956). Biometrika, (in the Press). 

Moon, A. M. (1954). Ann. Math. Statist. 25, 514. 

Pearson, E. 8. & Hartiey, H. O. (1954). Biometrika Tables for Statisticians, 1. 
Cambridge University Press. 

Putter, J. (1955). Ann. Math. Statist. 26, 268. 

ROSENBAUM, S. (1953). Ann. Math. Statist. 24, 663. 

WisHart, J. (1952). Biometrika, 39, 1. 


Likelihood function for capture-recapture samples 


By N. E. G. GILBERT 


John Innes Horticultural Institution 


When capture-recapture methods are used on a wild population, the sampling fraction is usually small, 
so that the distribution of marked and unmarked individuals in the recapture samples can be treated as 
binomial. However, samples of insects have sometimes included the greater part of the population. 
When a chain of such samples is used to estimate several statistics (e.g. population size and birth- and 
death-rates), the appropriate hypergeometric distribution is intractable. This note describes a simple 
way of correcting the error which is introduced when the binomial distribution is used instead of the 
hypergeometric. The population of size N contains n marked individuals, and one sample in the chain 
is of size X of which x aremarked. The situation will, of course, be more complicated than this in practice. 
Chapman (1951) suggests minimizing 
ae FIX —nIN 
(x/X) (1—a/X)(1—2/n) 
where the summation extends over ail samples in the chain. These estimates are asymptotically efficient; 
but since x/X is used instead of n/N in the denominator, they can be improved upon in the present case 
where information about N is obtainable from the other samples in the chain. This point is illustrated 
in the numerical example. 

A better weighting of the contributions from the different samples is obtained by minimizing the more 
usual expression for y? with denominator (n/N) (1—n/N)(1—X/N), but the solution of the resulting 
equation involves heavy arithmetic. 

Using the binomial approximation, the likelihood function is 


log L = X {xlog (n/N) + (X --x) log (1—n/N)}, 
d(logL) _ 5 tX—Na 





so that Abie poy 
This expression may also be obtained by differentiating the numerator only of 
(Na—nX}? 
2 =f wt. N. 
Be iio a 


In the hypergeometric case, each term in this expression for y? is divided by a further (1— X/N). The 
maximum-likelihood equation, when similarly modified, becomes 
d(logL) _ s nX — Nx 


dN  ~° (N-—X)(N—n)’ 








and the resulting estimates differ negligibly from those obtained by using the hypergeometric likelihood, 
although the arithmetic involved is obviously no worse than for the binomial case. 

Example. Suppose that eight individuals from a population with zero births and deaths are marked 
and released; that a first recapture sample of eight contains five marked individuals, so that the total 
number marked is now eleven; and that a second sample of thirteen contains five marked individuals. 
(With zero births and deaths, the distribution of the two kinds of mark affords no extra information.) 


This ] 
so (i.€ 
N fro 

The 


of a 




















Miscellanea 489 


This provides a simple numerical illustration; the two sampling fractions are different but not too much 
so (i.e. the estimation is not dominated by the information from one sample), and the two estimates of 
N from the individual samples differ (but not significantly). 

The estimates are 


Chapman’s minimum y? N = 20-4 

Binomial N= 21-744-91 
Modified binomial N = 23-1+ 3-68 
Hypergeometric N = 23-2 + 3-54 


REFERENCE 
Cuarman, D. G. (1951). Univ. Calif. Publ. Statist. 1, 131. 


Tables of Poisson power moments 


By J. B. DOUGLAS 
School of Mathematics, N.S.W. University of Technology, P.O. Box 1, Kensington, New South Wales 


In the fitting of the complete and truncated Neyman Type A contagious distribution by maximum- 
likelihood methods, the rather tedious arithmetic can be considerably reduced by use of tables of moments 
about the origin of the Poisson distribution. Appropriate tables, to 3 or 4 (and occasionally 5) significant 
digits, with a description of their method of use, were published in Biometrics, 11 (1955), 149-73. 

The tables are in fact of ratios of moments. If 


2) 
pe=eA & r*Ar/ri, 


r=0 
then Pe = Mevi/Me 
and Ge = Pel Pri — Pa) 


are tabulated for 
x = 0(1)19, 
A = 0-000 (0-001) 0-03 (0-01) 0-3 (0-1) 3-0. 


However, these tables also exist in manuscript form with values of p, calculated correct to 6 or 7 
significant digits (and corresponding values of q,), the ranges of tabulation being those given above. 
The existence of these more detailed tables is therefore recorded here; the author would be glad to learn 
of any applications where their use would be desirable. 





[ 490 ] 


REVIEWS 


Theory of Games and Statistical Decisions. By D. BLACKWELL and M. A. GirsnicK.* 
New York: John Wiley and Sons, Inc.; London: Chapman and Hall, Ltd. 1954. 
Pp. xi+ 355. 60s. 


Readers of this journal will have seen Barnard’s detailed review of A. Wald’s Statistical Decision 
Functions (Biometrika, 1953, 40, 475-7). The present book shows how statistical problems of various 
types can be handled via games theory and decision functions. It is intended as a text for first-year 
graduate students who may, however, find it heavy going; they will be familiar with the statistics 
required, but not with a number of other topics mostly of a topological nature. The authors have tried 
to make the book self-contained as regards most of these, but even so it will never be light reading. The 
detailed bibliography provided, together with a reading list for each chapter, will help to overcome 
some of these difficulties. 

The first few chapters are devoted to a thorough exposition of games theory; since the whole treat- 
ment is in terms of that theory this was necessary. Even so, a preliminary reading of McKinsey’s 
Introduction to the Theory of Games will be useful. A general description of statistical games (i.e. of 
statistical problems viewed as games between nature and the statistician) rounds off this part. This is 
followed by a chapter on utility and principles of choice which is, unfortunately, too short to provide 
more than a statement of utility theory in Ramsey—von Neumann terms and of the principles of choice 
known as minimax, minimax regret and Bayes. The reader is not made aware that other ‘reasonable’ 
criteria exist as well—presumably the authors had to draw a line somewhere. The rest of the book 
deals with theory of estimation, built on those foundations. There are two chapters on fixed sample-size 
games and two on sequential procedures. The final chapters are concerned with theory of estimation 
and comparison of experiments; they are most interesting and rewarding. 

The reader will be pleasantly surprised to find that his way of making decisions following classical 
theory is well founded even by the exacting standards set by the authors. It should be noted that 
‘decisions’ include quite properly estimation and not only design of experiments or sequential pro- 
cedures. The fact that decision theory supports normal statistical habits is reassuring—which does not 
mean that it is an unnecessary elaboration of the obvious, since a number of new applications follow 
from the new outlook. 

The practical man will probably require another book giving more detailed recipes and leaving out 
some of the rigours. Ultimately he will have to be grateful to the authors whose work makes the writing 
of such a book possible. The quest for generality has made the present book more difficult to read 
than it should be (see, for example, theorem 2.7.2 and its proof), but this might be considered a merit 
in an advanced text. If decision theory resembles a steam hammer it is also true that the statistician’s 
universe is not all nuts. 

The book contains many stimulating exercises and is well produced. G. MORTON 


Elementary Statistics for Students of Social Science and Business. By R. C. 
Sprowis. London: McGraw-Hill Book Co. Inc. 1955. Pp. 392. 41s. 6d. 


Basic Statistical Concepts. By J. K. Apams. London: McGraw-Hill Book Co. Inc. 
1955. Pp. 304. 4is. 6d. 


Elementary Statistics for Students of Social Science and Business is a text-book on the rudiments of 
statistics and probability with most of the mathematical techniques omitted. For this reason much 
explanation and example are necessary and these the author gives. Nine chapters are devoted to 
probability (including the binomial distribution) and methods of measuring location and dispersion, 
and the tenth covers linear regression. Thus far the statistics will be elementary for all. The author, 
who is an Assistant Professor of Business Statistics, then shows his bias by discussing the analysis of 


* It is sad to note that M. A. Girshick died on March 2, 1955 at the early age of 46, with his 
research and teaching activities in full swing. An obituary appeared in the June 1955 number of 
The American Statistician. 























Reviews 491 


time series and index numbers, two branches of statistics favoured by economists and business men. 
At the level at which the author aims the book is reasonably adequate. It is frankly utilitarian and 
conveys nothing of the excitement and interest attendant on the analysis of statistical problems. 
Certain topics might well have been omitted or at least accurately discussed. For example, it is 
questionable whether the statement that maximum-likelihood estimates are unbiased is sensible, 
since often maximum likelihood and bias seem to go hand in hand. 

The reviewer would judge this book as covering much of the work which would be done by a 
first year student in an English University reading statistics ancillary to another subject. Suchstudents 
undoubtedly find it easier to think in words rather than symbols and by them this book may be found 
helpful. 

Mr Adams, who writes Basic Statistical Concepts, is an Assistant Professor of Psychology. This book 
is not, however, a combination of factor analysis and ‘stat-rats’ as might possibly be expected, but a 
straightforward exposition of elementary statistics covering, from a less mathematical point of view, 
much the same ground already covered by A. M. Mood, Introduction to the Theory of Statistics (McGraw- 
Hill). There is no question in the reviewer’s mind which book is to be preferred. Given the writer’s 
objective, however, the exposition is tolerable if uninspired, and may be suitable as a textbook for 
the type of student which he has in mind. It is interesting to note that in our parent subject of 
mathematics every teacher of mathematics does not find it necessary to write a text-book for the uso 
of his classes—in fact very few do. Immediately after the war when the spate of statistical text-books 
began it seemed to be a tribute to the virility of our subject that so many persons should feel the 
urge of exposition. Looking back over the decade one realizes that it is probably because the inter- 
pretation of statistical mathematics is essentially subjective, and to each teacher the techniques and 
basic framework mean something slightly different. Nevertheless, the reviewer would suggest that the 
time has come to call a halt. Enough books on statistical methods now exist to suit most points of 
view even if it is necessary to indicate while teaching different emphasis where it appears desirable. 
In ten years we have grown up and what our subject now requires is not more and more verbiage on 
the elements but a series of specialist monographs each on a separate branch, a process which is already 
under way in the design of experiments. F. N. DAVID 


Experimental Design and its Statistical Basis. By D. J. Finney. London: Cam- 
bridge University Press. 1955. Pp. xi+ 169. 30s. 


Experimental Design. Theory and Application. By W. T. Feprerrer. New York 
and London: The Macmillan Company. 1956. Pp. 544+ 47. 77s. 


It is said that Beethoven when young divided his time between the study of counterpoint and the 
composition of music without becoming aware that the one was supposed to be an aid to the other. 
Many applied statisticians will sympathize with him in this. They spend their lives trying to under- 
stand the purpose and background of research programmes, debating the merits of different sets of 
treatments, and assessing the need to make a certain comparison especially accurate, and usually 
conclude by designing something quite simple in randomized blocks. In books, however, they live 
in a world of hyper-graceo-latin-cubes and super-magic latin squares; while, if Prof. Federer’s early 
pages are to be believed, the selection of treatments and of characteristics to be measured are ‘non- 
statistical’. He may be right, but Dr Finney would not agree. 

Dr Finney’s book is at once remarkable and remarkably good. It aims not at teaching statistics 
but at explaining what statisticians do. It discusses experimental design from the point of view of 
logic and common sense, and shows how many scientific arguments can be expressed in numerical 
form. There would be little point in a chapter-to-chapter critique; it will perhaps suffice to say that 
there is now at least one statistical laboratory where biologists asking for instruction will be asked 
first to read this book. They will not understand the chapter on fractional replication and confounding, 
but they should thereafter see statistical methods as a systematization of what they have previously 
done unmethodically, not as an alien culture imposed by a conquering power on the ancient civilization 
of medical and biological science. 

Prof. Federer’s book is different in purpose; it deals with experimental design only in the narrowest 
methodological sense, and within those limits is concerned to be comprehensive and detailed, which it 
certainly is. Also, it has a bibliography that is well-chosen and extensive. Nevertheless, for all its 
merits this is a disconcerting book to read. Thus, on p. 307 there starts a passage about ‘lattices’ and 
‘incomplete blocks’ that the reviewer still found incomprehensible after much rereading, nor was he 











492 Reviews 


helped by finding a table subjoined headed, ‘Incomplete blocks in a randomized complete block design’. 
Light dawned on p. 314 from a passage, which in any other context would have been a masterpiece 
of obscurity, from which it was gathered that the two terms were synonymous, and that incomplete 
blocks in complete block designs were what most people call lattices. This is perhaps an extreme 
example, but others nearly as bad could be adduced. Nevertheless, on account of its detailed com- 
pleteness this book is to be commended. S. C. PEARCE 


Population Genetics. By Carne Cuun Li. London: Cambridge University Press, for 
University of Chicago Press. 1955. Pp. xi+366. 75s. 


This book is a summary of the mathematical theory of population genetics. It begins with the study 
of large random-mating populations, and the Hardy—Weinberg law. It goes on to the consideration 
of correlations between relatives, and the modifications required for linked and sex-linked genes, and 
for autopolyploids. Various schemes of inbreeding are then discussed, such as selfing and sib-mating. 
Path coefficients are explained, and used in the study of correlation and more complicated systems of 
inbreeding. The author goes on to discuss the effects of assortative mating, mutation, selection, sub- 
division and migration. Finally, there are two chapters on the random fluctuations which are inevit- 
able in small populations. In short, most of the subjects cf major importance in human and popula- 
tion genetics are included, although in some cases the treatment is disappointingly brief, presumably 
in order to keep the size of the book within reasonable limits. But it would be very interesting to have 
more information on, for example, polyploid segregations. The emphasis is entirely on the mathematical 
theory, and only a little is said about the practical analysis of actual data. 

The explanations are extremely lucid, and anyone with a knowledge of quite elementary statistics 
and genetics should be able to follow most of the book without difficulty (even withuut calculus). 
The reviewer has noticed very few errors, although the proof of the ‘simplified version’ of Fisher’s 
fundamental theorem on natural selection (p. 273) is incorrect as it stands, since Li forgets to include 
the change in fitness between one generation (after selection) and the next generation (at birth). 
Also the final discussion on pp. 344-7 is a little confused, and may not take fully into account the 
complications introduced by inbreeding, linkage and assortative mating. And Li does not mention 
that the exact form of at least some of the gene-frequency distributions is rather controversial. But 
these are rather minor blemishes, and on the whole Li has set out the argument very clearly and 
logicclly, and is also well aware that the mathematical models represent an oversimplified version of 
the trie biological situation. 

The printing is excellent, and the diagrams are clear and helpful. There is a useful bibliography at 
the end. This book can be warmly recommended to anyone interested in the mathematical treatment 
of the genetical structure and evolution of populations. CEDRIC A. B. SMITH 


Statistical Methods (third edition). By Freprrick C. Mis. London: Sir Isaac 
Pitman and Sons, Ltd. 1955. Pp. xviii+ 842. 50s. 


This third edition of Prof. Mills’s well-known text-book has been substantially rewritten ‘to take 
account of the more important of recent developments (in statistical theory) that bear on the applica- 
tions of statistics in the social sciences, in business administration, and in governmental affairs’. 
It is addressed to the non-mathematical reader. Concepts and formulae are stated, and their applica- 
tions explained, with examples (based, as is to be expected in an American text-book, on American 
statistics) worked in detail; the mathematical reasoning on which the formulae are based is generally 
omitted. Among the topics included are the description of frequency distributions, problems of estima- 
tion, tests of hypotheses, the analysis of variance, simple and multiple correlation, methods of sampling, 
the analysis of time series, and index-numbers. 

The presentation is clear and unambiguous without being unusually inspiring, and the book should 
be intelligible to any reasonably intelligent reader who has determined to understand the elements of 
statistical method; but, as in many other books on the subject, the demand on the reader’s acquaint- 
ance with elementary algebra varies from chapter to chapter. Inevitably in an exposition of this scope, 
the treatment of relatively advanced topics omits some important qualifications and special cases; 
and in this, of course, lies the danger that the student may misapply the methods. 

This is not a book—nor is it claimed to be one—for the industrial statistician; quality control and 
production control are not treated, nor is the design of experiments. More surprisingly, since the book 





Th 
up 


ste 


pl 
t-t 





rw_ SD 


ve 

















Reviews 493 


is intended to deal with the use of statistics in business administration, there is no reference to market 
research ; and while random, stratified, systematic and cluster sampling are described, I have found no 
reference, ever in condemnation, to quota sampling. One might also have expected, in a work at this 
level, sume discussion of the collection of statistical data, the importance of definitions, margins of 
observational error, and similar matters which it is essential to consider when dealing with social, 
economic, business and administrative statistics. 

The perfect text-book of elementary statistics still remains to be written, and perhaps it is an 
impossibility, but the book under review will suit very many students. Their attention is directed to 
the ‘errata sheet’ which precedes the first page of the text; this relates to the biblography on pp. 821- 
30, from which, by an oversight, the numbers were omitted without which references in the text are 


meaningless. FREDERICK BROWN 


The Art of Investment. By A. G. Exxincrer. London: Bowes and Bowes. 1955. 
Pp. 170. 15s. 


Part I of Mr Ellinger’s book briefly sketches the mechanisms of the London Stock Markets, and details 
the recent history of security price-movements. Parts II and III are concerned with investment 
strategies and decisions, considered as functions of the movements in charts of prices, yields and 
volumes of share trading. These charts are taken as embodying the collective wisdom of investors. 
Since it is unprofitable to invest against market trends, the practical problem is to set up criteria for 
deciding the directions of the main trends. Mr Ellinger’s system is simple, although hedged about 
with rather imprecise qualifications and exceptions. It depends ultimately on the linearity of trends 
in the time series, charted on a logarithmic scale. (The unattractive consequence, that the trends on 
a natural scale are exponential, is not discussed.) 

The important question is: does Mr Ellinger’s system work reasonably well? The evidence presented 
is suggestive, but far from conclusive. I should like Mr Ellinger to formulate his system categorically 
and exhaustively, and apply it publicly to a random sample of securities selected by someone else. The 
results would be more convincing than any amount of examination of the past. A. STUART 


Measuring Business Changes; a Handbook of Significant Business Indicators. 
By Ricwarp M. SyyprEr. New York: John Wiley and Sons, Inc.; London: Chapman 
and. Hall, Ltd. 1955. Pp. 382. 64s. 


Mr Snyder’s reference book describes and explains about fifty available economic ‘indicators’ for the 
United States economy, ranging over National Income, Population Growth, Labour Statistics, Com- 
modity Prices, Indices of Production and Activity (both general and specific to sorne industries), 
Domestic and International Trade, Finance and Stock Markets. It is clearly an indispensable source 
for anyone seeking published information in this field, and gives ground for hope that a similar single 
source book will become available for the United Kingdom. A. STUART 


Statistics (second edition). By L. H. C. Trpprrr. Oxford University Press. 1956. 
Pp. 224. 78. 6d. 


This book, already a classic in the ‘expositions-for-the-layman’ group, has been révised and brought 
up to date by the author. The general plan and coverage of the book remain unchanged. 
F. N. DAVID 


Tafeln zum Vergleich zweier Stichproben mittels X-Test und Zeichentest 
(Tables for comparing two samples by X-Test and Sign Test.) By B. L. VAN DER 
WaERDEN and E. NreverGcect. Berlin: Springer-Verlag. 1956. Pp. 34. DM. 4.80. 

This booklet describes the method of computation and reference to tables of the pseudo-‘t’ order- 


statistic ‘X’, which van der Waerden proposed as a test of difference of central tendency in two sam- 


ples (Math. Ann. (1953), 126, 93). A similar discussion of the sign test analogue of the matched-pairs 
t-test is also given. 











494 Reviews 


Part II (in German) gives a worked example and Part IV is the English translation of this. Part IIT 
consists of tables (largely derived more or less directly from standard normal and binomial tables) 
and Part I gives, in German, a discussion of the relative merits of different tests and the method of 
computation of the tables. Since the X-test is applicable in exactly the same circumstances as Mann 
& Whitney’s generalization of Wilcoxon’s test and since, as van der Waerden notes in his comparative 
paper (Proc. K. Acad. Wet. Amst. A (1953), 56, 310), the asymptotic relative efficiency of this as 
compared with Student’s ¢ is ,/(3/7) = 0-977 when there is normal variation, the reviewer at least was 
not convinced of the need for a new test nor of its superiority to Wilcoxon’s. 

D. E. BARTON 


PUBLICATIONS OF THE U.S. DEPARTMENT OF COMMERCE, 
NATIONAL BUREAU OF STANDARDS 
(i) Tables of the error function and its derivative. Applied Mathematics Series 41. 
1954. [Re-issue of the 1941 Mathematical Table No. 8.] Pp. xi+ 302. $3.25. 


The main table provides values of the functions 
gee a 
H(z) = ro da and H’(x) = —e™* 
V7 J0 v7 


to 15 decimal places, for argument 2 = 0(0-0001) 1-0000 (0-001) 5-600. A short supplementary table 
gives values of 1— H(a) and H’(zx) to eight significant figures for 2 = 4-00 (0-01) 10-00. 


(ii) Table of sine and cosine integrals for arguments from 10 to 100. Applied 
Mathematics Series 32. 1954. [Re-issue of the 1942 Mathematical Table No. 13.] 
Pp. xv +187. $2.25. 
The main table provides values of the functions 
&sint “cost 
Si (x) =| om dt and Ci (x) ={ porvedt * 
o ¢ ao + 
with second differences, to 10 decimal places, for argument 2 = 10-00(0-01) 100-00. In addition, to 
facilitate interpolation, auxiliary tables are given of 4p(1—p) and }p(1—p?) for p = 0(0-001) 1-000. 
There is also a table to 15 decimai places of multiples of 477, i.e. of 4nm for n = 1(1) 100. 


(iii) Table of the gamma function for complex arguments. Applied Mathematics 
Series 34. 1954. Pp. xvi+ 105. $2.00. 

This table gives the real and imaginary parts of log, I'(z) for z = x+y, and x, y = 0-0(0-1) 10, each to 

12 decimal places. Auxiliary tables of sin 72, cos 7x, sinh 7a and cosh7z are given to 15 decimal places 

(or 15 significant figures) for x = 0(0-1) 10-0 to facilitate extension of the scope of the table. 








ray 











Reviews 495 


OTHER BOOKS RECEIVED 


Famous Problems and other Monographs. By F. Kern et al. New York: Chelsea 
Publishing Company. 1955. Pp. 321. $2.25. 


Income of American People. By H. P. Mitter. New York: John Wiley and Sons, 
tne.; London: Chapman and Hall, Ltd. 1955. Pp. xvi+206. 44s. 


Analysis of Confounded Factorial Experiments in Single Replication. By F. E. 
Bryet et al. North Carolina Agricultural Experiment Station (Technical Bulletin 
No. 113). 1955. Pp. 64. 


Theoretical Genetics. By R. B. Gotpscumipt. London: Cambridge University Press, 
for University of California Press. 1956. Pp. 563. 64s. 


Health in Industry (Sickness Absence Statistics). For London Transport Executive by 
Butterworth’s Medical Publications. 1956. Pp. 177. 35s. 


Elementary Topology. By D. W. Hatt and G. L. Spencer. New York: John Wiley 
and Sons, Ine.; London: Chapman and Hall, Ltd. 1955. Pp. 303. 56s. 


Rectangular-Polar Conversion Tables. By E. H. Nevitte. London: Cambridge 
University Press, for the Royal Society. 1956. Pp. xxxii+ 109. 30s. 


Irrationalzahlen, 2nd edition. By O. Perron. New York: Chelsea Publishing Co. 1951. 
Pp. 199. $1.50. 


Trigonometrical Series, 2nd edition. By A. Zyemunp. New York: Chelsea Publishing 
Co. 1952. Pp. 329. $1.50. 


Statistics. By W. A. Watuis and H. V. Rossrts. Glencoe, Illinois: The Free Press. 
1956. Pp. xii+646. $6. 


The Essentials of Educational Statistics. By Francis G. CoRNELL. New York: John 
Wiley and Sons, Inc.; London: Chapman and Hall, Ltd. 1956. Pp. 375. 46s. 


Annual Epidemiological and Vital Statistics 1953. World Health Organization, 
Geneva, Switzerland. (U.K. Saies Agent, H.M.S.0.) 1956. Pp. 571. 50s., $10 or 
Sw. fr. 30. 








[ 496 ] 


CORRIGENDA 


Correlated Random Normal Deviates. Tracts for Computers, No. 26. By E. C. 
Freccer, T. Lewis and E. 8. Pearson. 


These tables were issued by the Department of Statistics, University College, London, and published 
by the Cambridge University Press in 1955. As an experiment in method of reproduction, the tables 
were not set up in type but were reproduced direct by photo-offset process from sheets prepared at the 
National Physical Laboratory on an electromatic typewriter. It was hoped that this method would 
provide a clear facsimile copy of the original sheets, but it has been found unfortunately that this is not 
the case and a number of errors have appeared in the tables.* These consist of 

(a} missing or broken negative sigas, 

(b) broken figures, 

(c) one incorrect figure and one addition of a negative sign, probably due to printers’ ‘touching up’. 

These errors or opportunities for the misreading of figures are more serious in the present table than 
they would have been in some tables of random numbers or random normal deviates, because on each 
page column totals have been included in the form of Xx, Xa?, X(x_x,) and r(x», 2,). Thus there is a risk 
of inconsistent results arising from certain uses of the figures. 

The 60 pages of published table have now been carefully checked with the original sheets. The errors 
found falling under headings (a) and (c) are listed below in full; for (6) only those cases in which real 
ambiguity seems likely to arise are given. 

A full correction will be made to the figures in the second impression of the Tract which will be issued 
during the coming year. In the meantime, the authors and publishers wish to express their sincere 


regret to all those who have purchased or used the Tract in its present form. E.C.F., T.L., E.S.P. 


(a) For the vaiues listed here the negative sign is completely missing, or less than half there 


ry 
8 


Column Row 


Page Column Row 
1 XY, 24 22 Xs 30 
1 Xe 6* 23 xy 34 
3 Ly 19* 24 xy 46 
4 XL 25 28 Xs 1 
q Ly 47 and 48 30 Lg 44 
5 Xp 46 32 Ly 10 
6 Ly 47 32 x, 5 
6 X, 19 35 L, 44* 
7 Xo 50 36 Lo 22 
8 L5e 42* 37 Xe 29 
11 Ly 47* 37 X, 24 
11 Ly 15 40 Ly 37 
13 Ly 33* 44 2 36* 
13 Ls 24 47 Xs 43* 
13 Xs 25 52 Ly 44 
13 Wy 46 53 Xs, 43 
13 Xs 19 54 X2 15* 
16 Xs 10 56 Xo 19 
18 Xy 37 56 Xs 8 
19 Ze 45* 56 Xs 16 
19 x, 25 56 Ly 31 
19 xy 32 56 Xs 42 
21 vy 10 57 Xs 46 
21 Le 41 59 XL, 15 
21 vg 47 59 Xs 26 


* Negative sign missing altogether. 


* The process of reproduction did not easily allow the customary checking in proof which occurs 
with a printed table. 











~~ 











Ng 


Corrigenda 497 


(b) These values all have broken figures which might be misread 


Page Column Row Should read 
5 Ly Lr X, 49-8068 
12 Ly 40 —0-37 
16 Ly La — 5-95 
16 om 19 0-83 
25 Ls xx — 6-54 
28 Le 7 — 1-53 
34 Xq 40 — 1-36 
36 Ls 10 2-20 
41 Ls 44 delete mark in front of number 


which might be taken as a 
minus sign. Read 1-01 


48 ay 1 0-66 
57 ay 50 0-37 
60 ay 9 1-88 


(c) Incorrect additions 


6 Xo 43 For — 0-60 read 0-60 
28 Le 19 For —0-53 read — 0-54 


J.Gant. ‘Some theorems and sufficiency conditions for the maximum likelihood estimator 
of an unknown parameter in a simple Markov chain.’ Biometrika, 42, 342-59. 


The statement in line 8 below equation (22) on p. 352 is wrong, and invalidates the proof leading to (23) ; 
this result can be obtained more simply as follows. 

Instead of considering the probabilities p(x;, 0) that the ith observation of the variate x be x;, write 
p,(9) as the probability that the variate x has the value 7, where j is some integer in the given range. 
Let n,; be the frequency with which the value j occurs in the sample, so that 2 n; = n, the total number 


j 
of observations. Writing the likelihood function as 


L = Xin;lnp,(6), 
j 
the factorizability condition gives 


Xn; ln p,(O) = Inf(n,, ng, ...) +n F(T, 9), 
j 


where T' = T(n,, 7g, ...) is a sufficient estimator of 0. Providing p,(@) and F(T’, 0) are differentiable with 
respect to 0, this differentiation will lead to 


d 7) 
2njuj(0) = 2 ny 79 On PO) = 5p In F(T, O)} = Q(T, 9), 


where G(T’, 8) is some linear function of the frequencies n,;. 
Now let us assume that 7’ is a function formally differentiable in each of the n;, and that G(7’, @) is 
similarly differentiable in 7'; differentiation of G(7', 0) with respect to n; leads to 


aG(T,0) eT _ ais 
“TT = uO) (7 =1,2,...). 


Since T is a function of the n,; only, this relation shows that @T'/dn; can only be some function 1/v,(7’) 
of 7, so that 


eG 
or =14u,(0)v,(T) (7 =1,2,...). 











498 Corrigenda 
The equality of these products for all j is possible only if 
uO) =¢,K,(0), v,(T) = c7'(T), 


where the ¢c; are constants, and the form of the function G(T, /) is, therefore, 
G(T, 0) = K(0) | o(T) dT + K,(9) = K,(0) 9(T)+ K,(9), 
as previously obtained in the paper. The function g(7') is clearly 


. oT : 
gL) = | oT) dT = & | o(T) —dn,; = Xe;n,, 
j On; j 
a linear function of the ,;. 


The result (23), where the likelihood function is expressed in terms of individual observations x; 
instead of the frequencies n,;, follows directly. 


I am greatly indebted to Mr E. Bowen for pointing out my error, and to Prof. G. A. Barnard for 
suggesting the proof above. 


JOHN WISHART 
1898-1956 


John Wishart, who has been connected with this journal in an editorial capacity since 1937, 
died suddenly in Mexico on 14 July last. An obituary and appreciation will appear in the 
next number. 


E.S.P. 











A 





or 


he 





TRACTS FOR COMPUTERS 


Department of Statistics, University College, London 


I. Tables of the oe. econ and Trigamma Functions. By ELEANOR PAIRMA’\, M.A. 


1 
bles f S= here the p’s and q’ i 
Tables for summing = io Tt sind ins tat "thy ere the p’s and q s are numerical 





factors. Price 5s. net. 


V. Table of Coefficients of Everett’s Central-Difference Interpolation Formula. By A. J. 
THOMPSON, PH.D. Second edition. Price 7s. 6d. net. 


VI. Table of the Logarithms of the Complete [-Function (to ten decimal places) for 


Argument 2 to 1200 beyond Legendre’s Range (Argument 1 to 2). By Econ S. Pearson, 
D.Sc. Price 5s. net. 


IX. Log [I (x) from x=1 to 50-9 by intervals of 0-01. By JoHN Brown ez, M.D., D.Sc. 
Price 5s. net. 


X. On Quadrature and Cubature or on Methods of Determining Approximately Single 
and Double Integrals. By J. O. Irwin, D.Sc. Price 7s. 6d. net. 


XII. Tables of the Probable Error of the Coefficient of Correlation. By KARL HOLZzINGrR, 
PH.D. Price 5s. net. 


XIII. Bibliotheca Tabularum Mathematicarum, being a Descriptive Catalogue of Mathematical 
Tables. Part I. A, Logarithms of Numbers. By JAMES HENDERSON, PH.D. Price 9s. net. 


XV. Random Sampiing Numbers. By L H. C. Tippett, M.Sc., with a Foreword by KARL 
PEARSON. Price 5s. net. 


XXIII. Tables of tan-'x and log({1+ x’). To assist in the calculation of the ordinates ofa rearece 
Type IV curve. By L. J. Comriz, Pu.D. Price 5s. net. 


XXIV. Random Sampling Numbers (2nd Series). By M. G. KENDALL and B. BABINGTON SMITH. 
Price 5s. net. 


XXV. Random Normal Deviates. By HERMAN WoLD. Price 5s. net. 


XXVI. Correlated Random Normal Deviates. By E. C. Fietter, T. Lewis and E. S. PEARSON. 
Price 10s. 6d. net. 


Nos. II, I, IV, VI and VII are out of print 





—<— 


LOGARITHMETICA BRITANNICA 


A standard Table of Logarithms to Twenty Decimal Places. By A. J. THOMPSON, Ph.D. 
(commenced in 1922 to commemorate the tercentenary of the publication of HENRY BRIGGs’S 
Arithmetica Logarithmica). 

The nine separate sections of this Table have now been issued, and the complete work 


consisting of the logarithms of numbers 10,000-100,000, together with Dr Thompson’s 
General Introduction (98 pp.) is now available in two bound volumes. 


Price £8. 8s. od. 





-<>— 


Issued by the CAMBRIDGE UNIVERSITY PRESS, Bentley House, LONDON, N.W.1 
on behalf of the 
DEPARTMENT OF STATISTICS, UNIVERSITY COLLEGE, LONDON 
and obtainable from any bookseller 





(All rights reserved) 


BIOMETRIKA Vol. 43, Parts 3 and 4 
CONTENTS 


Studies in the history of probability and statistics. III. A note on the ee: of the — presentation of 
data. By Erica Royston ° ° ° ° ° ° . . 


Studies in the history of st and statistics. ‘Iv. A note on an early statistical nie of literary style. 
By C. B. WrittiaMs . ; ‘ - . . ° ° ° 


A goodness of fit test for sabictual distribution idlnees. of stationary time’ series with normal residuals. By 
A. M. WALKER. ° ° ° ° . . . ° . . 


Sufficiency conditions in regular Markov Ging and certain snes walks. By J. Gant < > ‘ 


Some asymptotic distribution nese for Markov chains with a denumerable number of states. By Oces 
DERMAN . ° ° ° ° . : ° ° ° . 


A general method for diiintilibiaapiteey to the distribution of likelihood ratio criteria. " By D. N. LAWLEY . ° 
On the accuracy of weighted means and ratios. By G.S. James . ° ° ° ° ° ° ° 


On estimating the latent and infectious —" of measles. II. Families with three or more man By 
Norman T. J. Bartzey.. ° 


Significance tests for a variable Sins of infection i in chain- sieinial theory. ‘By Norman T. J. Weicatt ° 
On the variation of yield variance with plot size. By P. WxITTLe . , ° : . . ° 
On the construction of significance tests on the circle and the sphere. By G. s. Watson and 5. J. WILLIAMS 
Notes on bias in estimation. By M. H. QuENOUILLE ° ° ° ° ° ° ° ° ° 


An introduction to some non-parametric — of analysis of variance and multivariate analysis. By 
8S. N. Roy and 8S. K. Mirra . ‘ ° ° 


A two-sample distribution-free test. By A. R. fas ° ° ° ° . . . 
Sequential analysis applied to certain experimental designs in the distin of variance. By W. D. ox 
Lognormal approximation to products and quotients. By 8S. R. BRoADBENT . . ° 
A rejection criterion based upon the range. By C. I. Buss, W. G. CocHRran and J. Ww. Tabi . ° 
Confidence intervals for a proportion. By Epwimy L. Crow. . . . . : . 
Serial correlation in regression analysis. II. By G. S. Watson and E. J. ‘Rees . . ‘ - ° 
MISCELLANEA 
Revised upper percentage points of the extreme studentized deviate from the sample mean. By H. A. Davip 
Exact linear sequential tests for the mean of a normal distribution. By J. Taytor . 
On the sum of sqnares of normal scores. By H. RuBEN ° ° ° . ° ° ° . 
On the momentz of the range and product moments of extreme order statistics in normal samples. By 
H. RuBen . ° . : 


On estimating binoyial response baboons, By F. J. pea! ° 


Existence and uniqueness of a — most ig randomized unbiased 1 test for the binomial. By 
. A. BLANK ° . . 


A note on the circular ssiidinidiihe distribution. ‘By G. 8. Sictin ° ° 
The fitting of regression curves with autocorrelated data. By N. A. Horriy 
Bounds for the variance of Kendall’s rank correlation statistic. By ALAN StuartT 
A note on the theory of quick tests. By D. R. Cox . . ° ° 
A note on the signs of gross correlation coefficients and partial ‘illite éaliiaiente. By ak hina 
The estimation of the mean of a censored normal distribution by ordered variables. By P. G. Moors 
A note on Wilcoxon’s and allied tests. By F.N. Davin. ° P ° . 
Likelihood function for capture-recapture samples. By N. E. G. Genial’ ° ° ° 
Tables of Poisson power moments. By J. B. Douacias 
REvIEWs 
D. BLacKwE Lt and M. A. Grrsuick’s ‘Theory of Games and Statistical Decisions’ 
R. C. Sprowts’s ‘Elementary Statistics for Students of Social Science and Business’ 
J. K. Apams’s ‘Basic Statistical Concepts’ . ° ° 
D. J. Fryney’s ‘Experimental Design and its Statistical Basis’ 
W. T. Feprrer’s ‘Experimental Design. Theory and Application’ 
Cuno Cuun L1’s ‘Population Genetics’ . ° 
Freperick C. Mru1s’s ‘Statistical Methods’ . ° 
A. G. ELLIncEr’s ‘The Art of Investment’ . ° ° . 
Ricuarp M. Snyper’s ‘Measuring Business Changes; a a Handbook of Significant Detain ittenteie’ 
L. H. C. Treperr’s ‘Statistics’ . ° . ° . ° 


B. L. VAN DER WAERDEN and E. ‘eireiiadina? 8 Pafeln : zum Vergleich 2 zweier Stichproben mittels X-Test und 
Zeichentest” . 


NATIONAL BUREAU OF Seisihenae, Publications of the U. 8. Sepatieanet of Dineen, ‘Applied Mathematics 
Series 41, 32,34 . . . 


Orner Books REcEIvVED ‘ 
CORRIGENDA . * . . 


Printed in Great Britain at the University Press, Cambridge (Brooke Crutchley, University Printer) 











- 


